Skip to main content

Flow Run

Defines the run-specific parameters for a Flow, one flow can have multiple Flow runs

FlowRun

Below are the properties for the FlowRun. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
flow_runYes

Property Details

FlowRunOptions

Options for a Flow Run

PropertyDefaultTypeRequiredDescription
parametersobject with property values of type None
NoDictionary of parameters to use for resource.
defaultsarray[None]
NoList of default configs with filters that can be applied to a resource config.
descriptionstring
NoBrief description of what the model does.
metadataNoMeta information of a Flow run. In most cases, it doesn't affect the system behavior but may be helpful to analyze project resources.
run_testsTruebooleanNoBoolean flag indicating whether to run tests after processing data.
store_test_resultsboolean
NoBoolean flag indicating whether to store test results.
componentsarray[string]
NoList of Component names to run.
component_categoriesarray[string]
NoList of Component categories to run.
halt_flow_on_errorboolean
NoBoolean flag indicating whether to halt the Flow on error.
disable_optimizersboolean
NoBoolean flag indicating whether to disable optimizers.
disable_incremental_metadata_collectionboolean
NoBoolean flag indicating whether to disable collection of Incremental Read and Transform Component metadata.
full_refreshFalseboolean
NoBoolean flag indicating whether to perform a full refresh of each Component. ⚠ If true, will drop all internal data and metadata tables/views and re-compute them from scratch.
update_materialization_typeFalseboolean
NoBoolean flag indicating whether to update Component materialization types (e.g., changing types between 'simple', 'view', 'incremental', and 'smart'). ⚠ If materialization type changes are detected, existing data and metadata tables/views will be dropped and re-computed from scratch. Otherwise, existing data and metadata tables/views will be preserved and type changes will result in an error.
backfill_missing_statisticsTrueboolean
NoBoolean flag indicating whether to backfill block statistics for existing data blocks that don't have statistics yet. If true (default), statistics will be computed and stored for data blocks that don't have them yet.
runner_overridesRunnerConfig
NoOverride runner configuration for this specific flow run. If not specified, inherits from the flow's runner configuration, or the deployment/workspace defaults.
namestring
NoFlow run name.
flow_namestringYesName of the Flow to run.
event_start_timestring
NoEvent start time to be used for time-series processing.
event_end_timestring
NoEvent end time to be used for time-series processing.

FlowRunMetadata

Meta-information of a flow run. In most cases, it doesn't affect the system behavior but may be helpful to analyze project resources.

PropertyDefaultTypeRequiredDescription
sourceNoThe origin or source information for the resource.
source_event_uuidstring
NoEvent UUID associated with creation of this resource.
backfill_runstring
NoName of the backfill run that scheduled this Flow run.
created_byNoUser who scheduled this Flow run.

ConfigFilter

Filter used to target configuration settings to a specific Flow and/or Component.

PropertyDefaultTypeRequiredDescription
kindstring ("Flow", "Component")YesResource kind to target with this configuration.
nameAny of:
  string
  array[string]
  array[None]
YesName of the resource to target with this configuration.
flow_nameAny of:
  string
  array[string]
  array[None]
NoName of the Flow to target with this configuration.
specAny of:
NoDictionary of parameters to use for the resource.

ComponentSpec

Specification for configuration applied to a component at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
skipboolean
NoBoolean flag indicating whether to skip processing for the Component or not.
retry_strategyNoRetry strategy configuration options for the Component if any exceptions are encountered.
data_maintenanceNoThe data maintenance configuration options for the Component.
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for Components.

FlowSpec

Specification for configuration applied to a Flow at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
data_planeNoThe data plane that will be used for the flow at runtime.
runnerRunnerConfig
NoRunner configuration.
component_concurrencyinteger
NoMaximum number of concurrent Components to run within this Flow.

DataPlane

The external warehouse where data is persisted throughout the Flow runs, and where primary computation on the data itself occurs.

PropertyDefaultTypeRequiredDescription
connection_namestring
No
metadata_storage_location_prefixstring
NoPrefix to prepend to the names of metadata tables created for this Flow. The prefix may include database/project/etc. and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the Data Plane's Connection configuration.

RegexFilter

A filter used to target resources based on a regex pattern.

PropertyDefaultTypeRequiredDescription
regexstringYesThe regex to filter the resources.

RunnerConfig

Configuration for the flow runner

PropertyDefaultTypeRequiredDescription
sizeAny of:
  RuntimeSize
  CustomRuntimeSize
NoRuntime size configuration. Can be: (1) a tier name string (X-Small, Small, Medium, Large, X-Large), or (2) a CustomRuntimeSize object with tier-based or fully custom resources.

CustomRuntimeSize

Runtime size configuration with flexible resource specification. Supports two modes: 1. Tier-based: Specify a tier with optional resource overrides 2. Fully custom: Specify CPU directly with optional memory/disk Either 'tier' or 'cpu' must be provided (or both).

PropertyDefaultTypeRequiredDescription
tierRuntimeSize
NoBase size tier (X-Small, Small, Medium, Large, X-Large). Required unless 'cpu' is specified.
cpustring
NoCPU allocation in whole cores (e.g., '1', '4', '8'). Required unless 'tier' is specified.
memorystring
NoMemory allocation. Use 'high' for tier-based doubling, or specify exact value with unit suffix (e.g., '32Gi', '4G', '512Mi')
diskstring
NoDisk allocation with unit suffix (e.g., '100Gi', '1Ti', '500G')

RuntimeSize

Enumeration of standard runtime size tiers. Each tier corresponds to specific resource allocations (CPU, memory, disk).

No properties defined.

BigQueryDataPlane

PropertyDefaultTypeRequiredDescription
bigqueryYesBigQuery configuration options.

BigQueryDataPlaneOptions

PropertyDefaultTypeRequiredDescription
partition_byAny of:
NoPartition By clause for the table.
cluster_byarray[string]
NoClustering keys to be added to the table.

BigQueryRangePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
rangeYesRange partitioning options.

BigQueryTimePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
granularitystring ("DAY", "HOUR", "MONTH", "YEAR")YesGranularity of the time partitioning.

DatabricksDataPlane

PropertyDefaultTypeRequiredDescription
databrickscluster_by: null
pyspark_job_cluster_id: null
table_properties: null
NoDatabricks configuration options.

DatabricksDataPlaneOptions

PropertyDefaultTypeRequiredDescription
table_propertiesobject with property values of type string
NoTable properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane.
pyspark_job_cluster_idstring
NoID of the compute cluster to use for PySpark jobs.
cluster_byarray[string]
NoClustering keys to be added to the table.

DuckdbDataPlane

PropertyDefaultTypeRequiredDescription
duckdbducklake_data_table_compaction:
  small_file_count_threshold: 50
  small_file_ratio_threshold: 0.25
  small_file_record_count_limit: 100000
ducklake_metadata_table_compaction:
  small_file_count_threshold: 10
  small_file_ratio_threshold: null
  small_file_record_count_limit: 10
NoDuckDB configuration options.

DuckDbDataPlaneOptions

PropertyDefaultTypeRequiredDescription
ducklake_metadata_table_compactionsmall_file_count_threshold: 10
small_file_ratio_threshold: null
small_file_record_count_limit: 10
DuckLakeTableCompactionSettingsNoSettings for compacting metadata tables. If present, metadata table compaction is enabled.
ducklake_data_table_compactionsmall_file_count_threshold: 50
small_file_ratio_threshold: 0.25
small_file_record_count_limit: 100000
DuckLakeTableCompactionSettingsNoSettings for compacting data tables. If present, data table compaction is enabled.

DuckLakeTableCompactionSettings

Settings for DuckLake table compaction.

PropertyDefaultTypeRequiredDescription
small_file_record_count_limit10integerNoFiles with fewer records than this limit are considered 'small'.
small_file_count_threshold10integerNoRun manual table compaction if the number of files with fewer than small_file_record_count_limit records exceeds this threshold.
small_file_ratio_thresholdnumber
NoPercentage (0.0-1.0) of small files relative to total files. If set, both absolute count AND ratio must pass for compaction to be triggered. If None, only absolute count check is performed.

FabricDataPlane

PropertyDefaultTypeRequiredDescription
fabricspark_session_config: null
NoFabric configuration options.

FabricDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configNoSpark session configuration.

RangeOptions

PropertyDefaultTypeRequiredDescription
startintegerYesStart of the range partitioning.
endintegerYesEnd of the range partitioning.
intervalintegerYesInterval of the range partitioning.

SnowflakeDataPlane

PropertyDefaultTypeRequiredDescription
snowflakeYesSnowflake configuration options.

SnowflakeDataPlaneOptions

PropertyDefaultTypeRequiredDescription
cluster_byarray[string]
NoClustering keys to be added to the table.

SynapseDataPlane

PropertyDefaultTypeRequiredDescription
synapsespark_session_config: null
NoSynapse configuration options.

SynapseDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configNoSpark session configuration.

LivySparkSessionConfig

PropertyDefaultTypeRequiredDescription
poolstring
NoPool to use for the Spark session.
driver_memorystring
NoMemory to use for the Spark driver.
driver_coresinteger
NoNumber of cores to use for the Spark driver.
executor_memorystring
NoMemory to use for the Spark executor.
executor_coresinteger
NoNumber of cores to use for each executor.
num_executorsinteger
NoNumber of executors to use for the Spark session.
session_key_overridestring
NoKey to use for the Spark session.
max_concurrent_sessionsinteger
NoMaximum number of concurrent sessions allowed for this configuration.

User

User information.

PropertyDefaultTypeRequiredDescription
namestring
No
emailstring
No
service_accountstring
No

DataMaintenance

Data maintenance configuration options for Components.

PropertyDefaultTypeRequiredDescription
enabledboolean
NoBoolean flag indicating whether data maintenance is enabled for the Component.
manual_table_compactionboolean
NoBoolean flag indicating whether manual table compaction is enabled for the Component. This is currently only relevant for DuckLake Data Planes.
manual_table_compaction_record_count_threshold10integer
NoConsider files with fewer than this number of records in determining whether to perform manual table compaction. This is currently only relevant for DuckLake Data Planes.
manual_table_compaction_file_count_threshold10integer
NoRun manual table compaction if the number of files with fewer than manual_table_compaction_record_count_threshold records exceeds this threshold. This is currently only relevant for DuckLake Data Planes.

ResourceLocation

The origin or source information for the resource.

PropertyDefaultTypeRequiredDescription
pathstringYesPath within repository files where the resource is defined.
first_line_numberinteger
NoFirst line number within path file where the resource is defined.

RetryStrategy

Retry strategy configuration for Component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

PropertyDefaultTypeRequiredDescription
stop_after_attemptinteger
NoNumber of retry attempts before giving up. If set to None, it will not stop after any number of attempts.
stop_after_delayinteger
NoMaximum time (in seconds) to spend on retries before giving up. If set to None, it will not stop after any time delay.