Yeah, that's a lot of things to query. I've never needed to query all five of these sources at once, but I did need 3. I needed to join our Posthog logs (we export these to AWS as JSONL files) to our Supabase data (we use Supabase as a Postgres database). I also wanted to join in some data from a Google Sheet.
DuckDB is a versatile, high-performance, in-memory analytical database that supports querying a wide range of data sources and formats. It's open-source and we use it as the data warehouse at Definite. A huge bonus with DuckDB is it's integrations and file support. It seamlessly interacts with various data sources, including PostgreSQL, Parquet, JSON, CSV, and even Google Sheets. Here's the query:
ATTACH 'postgres://your_username_password_host:5432/postgres' AS pgdb (TYPE postgres, READ_ONLY);
with interests as (
select *
from
read_parquet('https://storage.googleapis.com/duck-demo-data/user_interests.parquet')
), user_preferences as (
select *
from read_json_auto(
'https://storage.googleapis.com/duck-demo-data/user_preferences.jsonl',
format = 'newline_delimited',
records = true
)
), user_details as (
SELECT * FROM pgdb.user_details
), users as (
select *
from read_csv_auto('https://storage.googleapis.com/duck-demo-data/users.csv')
), one_more_thing as (
SELECT *
FROM read_csv_auto(
'https://docs.google.com/spreadsheets/export?format=csv&id=1O-sbeSxCpzhzZj5iTRnOplZX-dIAiQJAeIO0mlh2kSU',
normalize_names=True
)
)
select
users.user_id,
users.name,
interests.interest,
user_preferences.theme,
user_preferences.language,
user_details.hobby,
one_more_thing.one_more_thing
from users left join
interests on users.user_id = interests.user_id left join
user_preferences on users.user_id = user_preferences.user_id left join
user_details on users.user_id = user_details.user_id left join
one_more_thing on users.user_id = one_more_thing.user_id
If you run this query on the DuckDB CLI, everything but Postgres will work because the GCS and Google Sheets URLs are public. Here's a version without Postgres:
with interests as (
select *
from
read_parquet('https://storage.googleapis.com/duck-demo-data/user_interests.parquet')
), user_preferences as (
select *
from read_json_auto(
'https://storage.googleapis.com/duck-demo-data/user_preferences.jsonl',
format = 'newline_delimited',
records = true
)
), users as (
select *
from read_csv_auto('https://storage.googleapis.com/duck-demo-data/users.csv')
), one_more_thing as (
SELECT *
FROM read_csv_auto(
'https://docs.google.com/spreadsheets/export?format=csv&id=1O-sbeSxCpzhzZj5iTRnOplZX-dIAiQJAeIO0mlh2kSU',
normalize_names=True
)
)
select
users.user_id,
users.name,
interests.interest,
user_preferences.theme,
user_preferences.language,
one_more_thing.one_more_thing
from users left join
interests on users.user_id = interests.user_id left join
user_preferences on users.user_id = user_preferences.user_id left join
one_more_thing on users.user_id = one_more_thing.user_id
Get the new standard in analytics. Sign up below or get in touch and we’ll set you up in under 30 minutes.