DuckLake performance tuning

Optimize DuckLake query performance and storage efficiency with table partitioning and data inlining.

Table partitioning

Partition tables by columns or expressions to enable partition pruning — DuckLake skips reading data files that don't match your query filters.

Configure partition_by in your Profile under Data Plane options:

profiles/my-profile.yaml
profile:
  defaults:
    - kind: Flow
      name:
        regex: .*
      spec:
        data_plane:
          connection_name: data_plane_ducklake
          duckdb:
            ducklake:
              partition_by:
                - region

SQL expressions are also supported:

profiles/my-profile.yaml
profile:
  defaults:
    - kind: Flow
      name:
        regex: .*
      spec:
        data_plane:
          connection_name: data_plane_ducklake
          duckdb:
            ducklake:
              partition_by:
                - year(created_at)
                - month(created_at)

You can also set partition_by at the Flow or Component level for more granular control.

When to use: Large tables frequently filtered by specific columns. Partitioning by date columns is common for time-series data.

Data inlining

Store small tables directly in the metadata catalog instead of separate Parquet files. This reduces read latency for dimension tables and lookup data.

Configure data_inlining_row_limit in your Connection:

connections/data_plane_ducklake.yaml
connection:
  duckdb:
    ducklake:
      $<: $ascend_managed.duckdb_ducklake
      data_inlining_row_limit: 100000

Tables with fewer rows than the limit are stored inline in the metadata catalog.

Setting	Default	Description
`data_inlining_row_limit`	`0`	Row threshold for inlining. Set to `0` to disable.

When to use: Small dimension tables (under 100k rows) that are frequently joined.

Table partitioning​

Data inlining​

Table partitioning

Data inlining