Profile

A Profile is a set of configuration options and parameters that define the target where customer code is compiled/run.

Examples

profile_parameters.yaml
profile_skip_slow_components.yaml
profile_flow_data_plane_config.yaml

profile:
  parameters:
    param1: value1
    param2: value2

profile:
  defaults:
    - kind: Component
      name:
        regex: "^(?!slow).*"
      spec:
        skip: true

profile:
  defaults:
    - kind: Flow
      name:
        regex: "^my-flow-.*$"
      spec:
        data_plane:
          connection_name: "my-connection"

Profile

Below are the properties for the Profile. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
profile			Yes	Options and parameters for Profiles.

Property Details

ProfileOptions

Configuration options and parameters for Profiles.

Property	Type	Required	Description
pip_packages	array[string]	No	Python PIP packages to install
parameters	object with property values of type None	No	Dictionary of parameters to use for resource.
defaults	array[None]	No	List of default configs with filters that can be applied to a resource config.
description	string	No	Brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
ignore	array[string]	No	Additional ignore patterns to apply when using this profile (follows .gitignore syntax)

ConfigFilter

Filter used to target configuration settings to a specific Flow and/or Component.

Property	Type	Required	Description
kind	string ("Flow", "Component")	Yes	Resource kind to target with this configuration.
name	Any of: string array[string] array[None]	Yes	Name of the resource to target with this configuration.
flow_name	Any of: string array[string] array[None]	No	Name of the Flow to target with this configuration.
spec	Any of:	No	Dictionary of parameters to use for the resource.

ComponentSpec

Specification for configuration applied to a component at runtime based on the config filter.

Property	Type	Required	Description
skip	boolean	No	Boolean flag indicating whether to skip processing for the Component or not.
retry_strategy		No	Retry strategy configuration options for the Component if any exceptions are encountered.
data_maintenance		No	The data maintenance configuration options for the Component.
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for Components.

FlowSpec

Specification for configuration applied to a Flow at runtime based on the config filter.

Property	Type	Required	Description
data_plane		No	The data plane that will be used for the flow at runtime.
runner	RunnerConfig	No	Runner configuration.
component_concurrency	integer	No	Maximum number of concurrent Components to run within this Flow.

DataPlane

The external warehouse where data is persisted throughout the Flow runs, and where primary computation on the data itself occurs.

Property	Default	Type	Required	Description
connection_name		string	No
metadata_storage_location_prefix		string	No	Prefix to prepend to the names of metadata tables created for this Flow. The prefix may include database/project/etc. and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the Data Plane's Connection configuration.

RegexFilter

A filter used to target resources based on a regex pattern.

Property	Default	Type	Required	Description
regex		string	Yes	The regex to filter the resources.

RunnerConfig

Configuration for the flow runner

Property	Default	Type	Required	Description
size		Any of: RuntimeSize CustomRuntimeSize	No	Runtime size configuration. Can be: (1) a tier name string (X-Small, Small, Medium, Large, X-Large), or (2) a CustomRuntimeSize object with tier-based or fully custom resources.

CustomRuntimeSize

Runtime size configuration with flexible resource specification. Supports two modes: 1. Tier-based: Specify a tier with optional resource overrides 2. Fully custom: Specify CPU directly with optional memory/disk Either 'tier' or 'cpu' must be provided (or both).

Property	Type	Required	Description
tier	RuntimeSize	No	Base size tier (X-Small, Small, Medium, Large, X-Large). Required unless 'cpu' is specified.
cpu	string	No	CPU allocation in whole cores (e.g., '1', '4', '8'). Required unless 'tier' is specified.
memory	string	No	Memory allocation. Use 'high' for tier-based doubling, or specify exact value with unit suffix (e.g., '32Gi', '4G', '512Mi')
disk	string	No	Disk allocation with unit suffix (e.g., '100Gi', '1Ti', '500G')

RuntimeSize

Enumeration of standard runtime size tiers. Each tier corresponds to specific resource allocations (CPU, memory, disk).

No properties defined.

BigQueryDataPlane

Property	Default	Type	Required	Description
bigquery			Yes	BigQuery configuration options.

BigQueryDataPlaneOptions

Property	Default	Type	Required	Description
partition_by		Any of:	No	Partition By clause for the table.
cluster_by		array[string]	No	Clustering keys to be added to the table.

BigQueryRangePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
range			Yes	Range partitioning options.

BigQueryTimePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
granularity		string ("DAY", "HOUR", "MONTH", "YEAR")	Yes	Granularity of the time partitioning.

DatabricksDataPlane

Property	Default	Type	Required	Description
databricks	cluster_by: null pyspark_job_cluster_id: null table_properties: null		No	Databricks configuration options.

DatabricksDataPlaneOptions

Property	Type	Required	Description
table_properties	object with property values of type string	No	Table properties to include when creating the data table. This setting is equivalent to the `CREATE TABLE ... TBLPROPERTIES` clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane.
pyspark_job_cluster_id	string	No	ID of the compute cluster to use for PySpark jobs.
cluster_by	array[string]	No	Clustering keys to be added to the table.

DuckdbDataPlane

Property	Default	Type	Required	Description
duckdb	ducklake_data_table_compaction: small_file_count_threshold: 50 small_file_ratio_threshold: 0.25 small_file_record_count_limit: 100000 ducklake_metadata_table_compaction: small_file_count_threshold: 10 small_file_ratio_threshold: null small_file_record_count_limit: 10		No	DuckDB configuration options.

DuckDbDataPlaneOptions

Property	Default	Type	Required	Description
ducklake_metadata_table_compaction	small_file_count_threshold: 10 small_file_ratio_threshold: null small_file_record_count_limit: 10	DuckLakeTableCompactionSettings	No	Settings for compacting metadata tables. If present, metadata table compaction is enabled.
ducklake_data_table_compaction	small_file_count_threshold: 50 small_file_ratio_threshold: 0.25 small_file_record_count_limit: 100000	DuckLakeTableCompactionSettings	No	Settings for compacting data tables. If present, data table compaction is enabled.

DuckLakeTableCompactionSettings

Settings for DuckLake table compaction.

Property	Default	Type	Required	Description
small_file_record_count_limit	10	integer	No	Files with fewer records than this limit are considered 'small'.
small_file_count_threshold	10	integer	No	Run manual table compaction if the number of files with fewer than `small_file_record_count_limit` records exceeds this threshold.
small_file_ratio_threshold		number	No	Percentage (0.0-1.0) of small files relative to total files. If set, both absolute count AND ratio must pass for compaction to be triggered. If None, only absolute count check is performed.

FabricDataPlane

Property	Default	Type	Required	Description
fabric	spark_session_config: null		No	Fabric configuration options.

FabricDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

RangeOptions

Property	Type	Required	Description
start	integer	Yes	Start of the range partitioning.
end	integer	Yes	End of the range partitioning.
interval	integer	Yes	Interval of the range partitioning.

SnowflakeDataPlane

Property	Default	Type	Required	Description
snowflake			Yes	Snowflake configuration options.

SnowflakeDataPlaneOptions

Property	Default	Type	Required	Description
cluster_by		array[string]	No	Clustering keys to be added to the table.

SynapseDataPlane

Property	Default	Type	Required	Description
synapse	spark_session_config: null		No	Synapse configuration options.

SynapseDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

LivySparkSessionConfig

Property	Type	Required	Description
pool	string	No	Pool to use for the Spark session.
driver_memory	string	No	Memory to use for the Spark driver.
driver_cores	integer	No	Number of cores to use for the Spark driver.
executor_memory	string	No	Memory to use for the Spark executor.
executor_cores	integer	No	Number of cores to use for each executor.
num_executors	integer	No	Number of executors to use for the Spark session.
session_key_override	string	No	Key to use for the Spark session.
max_concurrent_sessions	integer	No	Maximum number of concurrent sessions allowed for this configuration.

DataMaintenance

Data maintenance configuration options for Components.

Property	Default	Type	Required	Description
enabled		boolean	No	Boolean flag indicating whether data maintenance is enabled for the Component.
manual_table_compaction		boolean	No	Boolean flag indicating whether manual table compaction is enabled for the Component. This is currently only relevant for DuckLake Data Planes.
manual_table_compaction_record_count_threshold	10	integer	No	Consider files with fewer than this number of records in determining whether to perform manual table compaction. This is currently only relevant for DuckLake Data Planes.
manual_table_compaction_file_count_threshold	10	integer	No	Run manual table compaction if the number of files with fewer than `manual_table_compaction_record_count_threshold` records exceeds this threshold. This is currently only relevant for DuckLake Data Planes.

ResourceMetadata

Meta information of a resource. In most cases, it doesn't affect the system behavior but may be helpful to analyze Project resources.

Property	Default	Type	Required	Description
source			No	The origin or source information for the resource.
source_event_uuid		string	No	Event UUID associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

Property	Default	Type	Required	Description
path		string	Yes	Path within repository files where the resource is defined.
first_line_number		integer	No	First line number within `path` file where the resource is defined.

RetryStrategy

Retry strategy configuration for Component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

Property	Default	Type	Required	Description
stop_after_attempt		integer	No	Number of retry attempts before giving up. If set to None, it will not stop after any number of attempts.
stop_after_delay		integer	No	Maximum time (in seconds) to spend on retries before giving up. If set to None, it will not stop after any time delay.

Examples​

Profile​

Property Details​

ProfileOptions​

ConfigFilter​

ComponentSpec​

FlowSpec​

DataPlane​

RegexFilter​

RunnerConfig​

CustomRuntimeSize​

RuntimeSize​

BigQueryDataPlane​

BigQueryDataPlaneOptions​

BigQueryRangePartitioning​

BigQueryTimePartitioning​

DatabricksDataPlane​

DatabricksDataPlaneOptions​

DuckdbDataPlane​

DuckDbDataPlaneOptions​

DuckLakeTableCompactionSettings​

FabricDataPlane​

FabricDataPlaneOptions​

RangeOptions​

SnowflakeDataPlane​

SnowflakeDataPlaneOptions​

SynapseDataPlane​

SynapseDataPlaneOptions​

LivySparkSessionConfig​

DataMaintenance​

ResourceMetadata​

ResourceLocation​

RetryStrategy​

Examples

Profile

Property Details

ProfileOptions

ConfigFilter

ComponentSpec

FlowSpec

DataPlane

RegexFilter

RunnerConfig

CustomRuntimeSize

RuntimeSize

BigQueryDataPlane

BigQueryDataPlaneOptions

BigQueryRangePartitioning

BigQueryTimePartitioning

DatabricksDataPlane

DatabricksDataPlaneOptions

DuckdbDataPlane

DuckDbDataPlaneOptions

DuckLakeTableCompactionSettings

FabricDataPlane

FabricDataPlaneOptions

RangeOptions

SnowflakeDataPlane

SnowflakeDataPlaneOptions

SynapseDataPlane

SynapseDataPlaneOptions

LivySparkSessionConfig

DataMaintenance

ResourceMetadata

ResourceLocation

RetryStrategy