DuckDB Data Plane with DuckLake
Run DuckDB processing in Ascend using DuckLake, an open lakehouse format that stores data in Parquet files with metadata managed separately.
How it works
DuckLake runs ephemeral DuckDB instances directly on Flow runners. All data and metadata are stored remotely, giving you in-process compute performance with centralized data management.
Each Flow run:
- Spins up a DuckDB instance on the Flow runner
- Connects to the metadata catalog (DuckDB file or PostgreSQL)
- Reads/writes data from object storage (S3, GCS, ABFS)
- Shuts down when complete — no persistent infrastructure
Choose your setup
| Setup | Description | Best for |
|---|---|---|
| Ascend-managed | Ascend handles all infrastructure | Most users — zero setup, fastest path |
| Self-hosted | Bring your own storage and metadata | Existing infrastructure, compliance requirements |
| Performance tuning | Partitioning and data inlining | Optimizing query performance |
Ascend-managed Instances include a default DuckLake Connection that works well for small to medium workloads. See Ascend-managed DuckLake for available tuning options, or self-hosted DuckLake to bring your own storage.