Skip to main content

DuckDB Data Plane with DuckLake

Run DuckDB processing in Ascend using DuckLake, an open lakehouse format that stores data in Parquet files with metadata managed separately.

How it works

DuckLake runs ephemeral DuckDB instances directly on Flow runners. All data and metadata are stored remotely, giving you in-process compute performance with centralized data management.

Each Flow run:

  1. Spins up a DuckDB instance on the Flow runner
  2. Connects to the metadata catalog (DuckDB file or PostgreSQL)
  3. Reads/writes data from object storage (S3, GCS, ABFS)
  4. Shuts down when complete — no persistent infrastructure

Choose your setup

SetupDescriptionBest for
Ascend-managedAscend handles all infrastructureMost users — zero setup, fastest path
Self-hostedBring your own storage and metadataExisting infrastructure, compliance requirements
Performance tuningPartitioning and data inliningOptimizing query performance

Ascend-managed Instances include a default DuckLake Connection that works well for small to medium workloads. See Ascend-managed DuckLake for available tuning options, or self-hosted DuckLake to bring your own storage.