Skip to main content

DuckLake performance tuning

Optimize DuckLake query performance and storage efficiency with table partitioning and data inlining.

Table partitioning

Partition tables by columns or expressions to enable partition pruning — DuckLake skips reading data files that don't match your query filters.

Configure partition_by in your Profile under Data Plane options:

profiles/my-profile.yaml
profile:
defaults:
- kind: Flow
name:
regex: .*
spec:
data_plane:
connection_name: data_plane_ducklake
duckdb:
ducklake:
partition_by:
- region

SQL expressions are also supported:

profiles/my-profile.yaml
profile:
defaults:
- kind: Flow
name:
regex: .*
spec:
data_plane:
connection_name: data_plane_ducklake
duckdb:
ducklake:
partition_by:
- year(created_at)
- month(created_at)

You can also set partition_by at the Flow or Component level for more granular control.

When to use: Large tables frequently filtered by specific columns. Partitioning by date columns is common for time-series data.

Data inlining

Store small tables directly in the metadata catalog instead of separate Parquet files. This reduces read latency for dimension tables and lookup data.

Configure data_inlining_row_limit in your Connection:

connections/data_plane_ducklake.yaml
connection:
duckdb:
ducklake:
$<: $ascend_managed.duckdb_ducklake
data_inlining_row_limit: 100000

Tables with fewer rows than the limit are stored inline in the metadata catalog.

SettingDefaultDescription
data_inlining_row_limit0Row threshold for inlining. Set to 0 to disable.

When to use: Small dimension tables (under 100k rows) that are frequently joined.