DuckDB Data Plane with DuckLake

Run DuckDB processing in Ascend using DuckLake, an open lakehouse format that stores data in Parquet files with metadata managed separately.

How it works

DuckLake runs ephemeral DuckDB instances directly on Flow runners. All data and metadata are stored remotely, giving you in-process compute performance with centralized data management.

Each Flow run:

Spins up a DuckDB instance on the Flow runner
Connects to the metadata catalog (DuckDB file or PostgreSQL)
Reads/writes data from object storage (S3, GCS, ABFS)
Shuts down when complete — no persistent infrastructure

Choose your setup

Setup	Description	Best for
Ascend-managed	Ascend handles all infrastructure	Most users — zero setup, fastest path
Self-hosted	Bring your own storage and metadata	Existing infrastructure, compliance requirements
Performance tuning	Partitioning and data inlining	Optimizing query performance

Ascend-managed Instances include a default DuckLake Connection that works well for small to medium workloads. See Ascend-managed DuckLake for available tuning options, or self-hosted DuckLake to bring your own storage.

How it works​

Choose your setup​

How it works

Choose your setup