Skip to main content

DuckDB with DuckLake in Ascend

DuckLake is a lakehouse format that combines data lake flexibility with data warehouse reliability. It uses a standard SQL database for metadata management while storing data in Parquet files in a filesystem or object storage. For complete technical details, see the official DuckLake documentation.

Connections​

When using DuckLake with Ascend, you create a single DuckDB with DuckLake Connection that serves as your Data Plane. This Connection handles query execution through DuckDB with catalog coordination through its DuckLake extension.

Your DuckDB with DuckLake Connection requires two accessory Connections:

  • Metadata storage Connection: External database (PostgreSQL)
  • Data storage Connection: Object storage (S3, GCS, ABFS)

This separation allows you to scale data storage, metadata storage, and compute independently while maintaining full ACID compliance and query performance.

note

Ascend currently supports PostgreSQL for DuckLake metadata storage. While DuckLake itself supports additional databases like MySQL, these aren't yet available in Ascend. For details on DuckLake's full database support, see the DuckLake documentation.

Architecture​

DuckDB differs significantly from other Ascend Data Planes in its architectural approach:

Other Data Planes​

DuckLake via Ascend​

Key differences​

DuckDB with DuckLake differs from other Ascend Data Planes in two fundamental ways that make it uniquely suited for cost-effective, flexible data infrastructure.

Query execution happens in your Ascend Flow runner. Unlike other Data Planes where compute is pushed down to the platform's infrastructure, DuckDB processes all queries directly within your Ascend Flow runner. This approach gives you direct control over compute resources and costs, although it means you'll be limited by available memory for very large aggregations or joins.

Clean separation of metadata and data storage. DuckLake stores all metadata in a standard SQL database while keeping data as Parquet files in object storage. This architecture allows you to scale metadata storage, data storage, and compute independently. You can choose Ascend-managed infrastructure or bring your own PostgreSQL database and object storage, giving you flexibility that's particularly valuable for teams without existing data warehouse infrastructure.

These tradeoffs are often worthwhile for teams seeking cost-effective data infrastructure. DuckLake's flexible architecture, combined with Ascend's orchestration capabilities, provides an excellent foundation for modern data workflows.

Next steps​

Ready to experience the benefits of DuckLake with Ascend? Set up your DuckDB with DuckLake Connection and start building cost-effective, scalable data pipelines today.