DuckLake performance tuning
Optimize DuckLake query performance and storage efficiency with table partitioning and data inlining.
Table partitioning
Partition tables by columns or expressions to enable partition pruning — DuckLake skips reading data files that don't match your query filters.
Configure partition_by in your Profile under Data Plane options:
profile:
defaults:
- kind: Flow
name:
regex: .*
spec:
data_plane:
connection_name: data_plane_ducklake
duckdb:
ducklake:
partition_by:
- region
SQL expressions are also supported:
profile:
defaults:
- kind: Flow
name:
regex: .*
spec:
data_plane:
connection_name: data_plane_ducklake
duckdb:
ducklake:
partition_by:
- year(created_at)
- month(created_at)
You can also set partition_by at the Flow or Component level for more granular control.
When to use: Large tables frequently filtered by specific columns. Partitioning by date columns is common for time-series data.
Data inlining
Store small tables directly in the metadata catalog instead of separate Parquet files. This reduces read latency for dimension tables and lookup data.
Configure data_inlining_row_limit in your Connection:
connection:
duckdb:
ducklake:
$<: $ascend_managed.duckdb_ducklake
data_inlining_row_limit: 100000
Tables with fewer rows than the limit are stored inline in the metadata catalog.
| Setting | Default | Description |
|---|---|---|
data_inlining_row_limit | 0 | Row threshold for inlining. Set to 0 to disable. |
When to use: Small dimension tables (under 100k rows) that are frequently joined.