Skip to main content

Self-hosted DuckLake

Set up DuckDB with your own storage infrastructure. Bring your own object storage for data, and choose between object storage or PostgreSQL for metadata.

This guide assumes you have an Ascend Instance with a Project and Workspace.

Option A: Object storage metadata

Store both metadata and data in your object storage — no database required.

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
s3:
root: s3://my-bucket/ducklake/metadata/
aws_access_key_id: ${vaults.aws.access_key_id}
aws_secret_access_key: ${vaults.aws.secret_access_key}
data_connection:
s3:
root: s3://my-bucket/ducklake/data/
aws_access_key_id: ${vaults.aws.access_key_id}
aws_secret_access_key: ${vaults.aws.secret_access_key}

The metadata catalog is stored as a DuckDB file in your object storage, synced automatically between Flow runs.

GCS example

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
gcs:
root: gs://my-bucket/ducklake/metadata/
key: ${vaults.gcp.service_account_key}
data_connection:
gcs:
root: gs://my-bucket/ducklake/data/
key: ${vaults.gcp.service_account_key}

ABFS example

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
abfs:
account: mystorageaccount
root: abfss://container/ducklake/metadata/
shared_key: ${vaults.azure.shared_key}
data_connection:
abfs:
account: mystorageaccount
root: abfss://container/ducklake/data/
shared_key: ${vaults.azure.shared_key}

Option B: PostgreSQL metadata

Use PostgreSQL for metadata when you need shared access across multiple Flows or have existing PostgreSQL infrastructure.

Using Connection references

Reference existing Connections by name:

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection_name: my_postgres_connection
data_connection_name: my_s3_connection

Using inline connections

Define connections inline:

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
metadata_connection:
postgres:
host: my-postgres-host.example.com
port: 5432
database: ducklake_metadata
user: ${vaults.postgres.user}
password: ${vaults.postgres.password}
data_connection:
s3:
root: s3://my-bucket/ducklake/data/
aws_access_key_id: ${vaults.aws.access_key_id}
aws_secret_access_key: ${vaults.aws.secret_access_key}

Choosing a metadata backend

BackendMetadata storageBest for
Object storageDuckDB file in S3/GCS/ABFSSimpler setup, single-Flow deployments
PostgreSQLDedicated databaseShared metadata across Flows, high concurrency

Next steps