Self-hosted DuckLake
Set up DuckDB with your own storage infrastructure. Bring your own object storage for data, and choose between object storage or PostgreSQL for metadata.
This guide assumes you have an Ascend Instance with a Project and Workspace.
Option A: Object storage metadata
Store both metadata and data in your object storage — no database required.
connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
s3:
root: s3://my-bucket/ducklake/metadata/
aws_access_key_id: ${vaults.aws.access_key_id}
aws_secret_access_key: ${vaults.aws.secret_access_key}
data_connection:
s3:
root: s3://my-bucket/ducklake/data/
aws_access_key_id: ${vaults.aws.access_key_id}
aws_secret_access_key: ${vaults.aws.secret_access_key}
The metadata catalog is stored as a DuckDB file in your object storage, synced automatically between Flow runs.
GCS example
connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
gcs:
root: gs://my-bucket/ducklake/metadata/
key: ${vaults.gcp.service_account_key}
data_connection:
gcs:
root: gs://my-bucket/ducklake/data/
key: ${vaults.gcp.service_account_key}
ABFS example
connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
abfs:
account: mystorageaccount
root: abfss://container/ducklake/metadata/
shared_key: ${vaults.azure.shared_key}
data_connection:
abfs:
account: mystorageaccount
root: abfss://container/ducklake/data/
shared_key: ${vaults.azure.shared_key}
Option B: PostgreSQL metadata
Use PostgreSQL for metadata when you need shared access across multiple Flows or have existing PostgreSQL infrastructure.
Using Connection references
Reference existing Connections by name:
connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection_name: my_postgres_connection
data_connection_name: my_s3_connection
Using inline connections
Define connections inline:
Choosing a metadata backend
| Backend | Metadata storage | Best for |
|---|---|---|
| Object storage | DuckDB file in S3/GCS/ABFS | Simpler setup, single-Flow deployments |
| PostgreSQL | Dedicated database | Shared metadata across Flows, high concurrency |
Next steps
- Performance tuning — optimize with partitioning and data inlining
- Read data — connect to your data sources
- Transform data — process and transform your data
- Write data — output your transformed data