Skip to main content

Self-hosted DuckLake

Set up DuckDB in Ascend with your own DuckLake infrastructure. You manage the PostgreSQL instance and object storage while Ascend coordinates the processing.

Prerequisites​

Overview​

In Ascend, DuckLake runs ephemeral DuckDB processing directly on Flow runners with all data and metadata stored remotely, giving you the benefits of in-process compute with reliable, centralized data and metadata management.

  1. PostgreSQL Connection → Stores metadata and schema information
  2. Object store Connection → Stores actual data files (Parquet format)
  3. DuckDB Connection with DuckLake catalog → Coordinates processing between metadata and data

This guide assumes your PostgreSQL and object store Connections are already configured. Name them consistently:

  • PostgreSQL Connection: data_plane_ducklake_metadata.yaml (or similar)
  • Object store Connection: data_plane_ducklake_data_gcs.yaml (or similar; adjust suffix for your provider)

Configure Project parameters​

DuckDB and DuckLake require specific parameters to coordinate between your data and metadata Connections. Add these parameters to your Ascend Project file:

ascend_project.yaml
...
parameters:
data_planes:
ducklake:
data_connection_name: data_plane_ducklake_data_gcs
metadata_connection_name: data_plane_ducklake_metadata

Now that your Project parameters are configured, the next step is to create the DuckDB Connection that integrates a DuckLake catalog.

Create a DuckDB Connection

From your Workspace Super Graph view, follow these steps:

  1. Create a Connection by either:
    • Clicking the + button next to Connections in the left Build panel
    • add-connection
    • Right-clicking in the Super Graph and selecting Create Connection
    • menu
  2. Enter a descriptive name like data_plane_duckdb
  3. Select DuckDB from the available options
  4. Fill in the required fields (and any optional fields as needed)
  5. Click Save at the bottom to create your Connection
  6. form

For complete configuration options, see our Connection reference guide.

Your DuckDB Connection file should look like this:

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
data_connection_name: ${parameters.data_planes.ducklake.data_connection_name}
metadata_connection_name: ${parameters.data_planes.ducklake.metadata_connection_name}

This Connection configuration references the parameters you defined earlier and automatically coordinates between your PostgreSQL metadata store and object storage.

Verify your setup​

Test all three Connections to verify they're configured correctly:

  1. PostgreSQL Connection (data_plane_ducklake_metadata.yaml)
  2. Object store Connection (e.g. data_plane_ducklake_data_gcs.yaml)
  3. DuckLake catalog Connection (data_plane_ducklake.yaml)

Optimize performance with concurrency​

For optimal performance with DuckLake, configure concurrency settings to balance throughput with reliability.

🎉 Congratulations, you just set up a DuckDB Connection with a DuckLake catalog in Ascend!

Now that your DuckDB Connection is set up: