Backfill Run

Defines the parameters for a backfill run.

BackfillRun

Below are the properties for the BackfillRun. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
backfill_run			Yes	Backfill run options.

Property Details

BackfillRunOptions

Options for a backfill run.

Property	Default	Type	Required	Description
description		string	No	Brief description of what the model does.
metadata			No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name		string	Yes	The name of the model
flow_name		string	Yes	Name of the Flow that is to be backfilled.
start_time		string	Yes	Start time of the time range to be backfilled.
end_time		string	Yes	End time of the time range to be backfilled.
granularity		string ("day", "week", "month")	Yes	Time granularity to use for backfill. Must be one of: 'day', 'week', 'month'. The backfill runner divides the date range into Flow runs of this granularity and launches these Flow runs.
max_concurrent_flow_runs	1	integer	No	Maximum number of concurrent Flow runs used for backfill. This is used to limit the number of Flow runners (and hence cluster resources) that are launched simultaneously.
backfill_order		string ("forward_chronological", "reverse_chronological")	No	Order to use for backfilling - either forward or reverse chronological order.
flow_run_options			No	Additional options for each Flow run launched during the backfill.
run_final_sync		boolean	No	Boolean flag indicating whether to run a final sync after concurrent backfill Flow runs. This final sync is a single Flow run that is executed without any time parameters, and is meant to sync the data to the latest state and capture any missing time intervals.

FlowRunBaseOptions

Base options for a Flow Run

Property	Default	Type	Required	Description
parameters		object with property values of type None	No	Dictionary of parameters to use for resource.
defaults		array[None]	No	List of default configs with filters that can be applied to a resource config.
description		string	No	Brief description of what the model does.
metadata			No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
run_tests	True	boolean	No	Boolean flag indicating whether to run tests after processing data.
store_test_results		boolean	No	Boolean flag indicating whether to store test results.
components		array[string]	No	List of Component names to run.
component_categories		array[string]	No	List of Component categories to run.
halt_flow_on_error		boolean	No	Boolean flag indicating whether to halt the Flow on error.
disable_optimizers		boolean	No	Boolean flag indicating whether to disable optimizers.
disable_incremental_metadata_collection		boolean	No	Boolean flag indicating whether to disable collection of Incremental Read and Transform Component metadata.
full_refresh	False	boolean	No	Boolean flag indicating whether to perform a full refresh of each Component. ⚠ If true, will drop all internal data and metadata tables/views and re-compute them from scratch.
update_materialization_type	False	boolean	No	Boolean flag indicating whether to update Component materialization types (e.g., changing types between 'simple', 'view', 'incremental', and 'smart'). ⚠ If materialization type changes are detected, existing data and metadata tables/views will be dropped and re-computed from scratch. Otherwise, existing data and metadata tables/views will be preserved and type changes will result in an error.
runner_overrides		RunnerConfig	No	Override runner configuration for this specific flow run. If not specified, inherits from the flow's runner configuration, or the deployment/workspace defaults.

ConfigFilter

Filter used to target configuration settings to a specific Flow and/or Component.

Property	Type	Required	Description
kind	string ("Flow", "Component")	Yes	Resource kind to target with this configuration.
name	Any of: string array[string] array[None]	Yes	Name of the resource to target with this configuration.
flow_name	string	No	Name of the Flow to target with this configuration.
spec	Any of:	No	Dictionary of parameters to use for the resource.

ComponentSpec

Specification for configuration applied to a component at runtime based on the config filter.

Property	Type	Required	Description
skip	boolean	No	Boolean flag indicating whether to skip processing for the Component or not.
retry_strategy		No	Retry strategy configuration options for the Component if any exceptions are encountered.
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for Components.

FlowSpec

Specification for configuration applied to a Flow at runtime based on the config filter.

Property	Default	Type	Required	Description
data_plane			No	The data plane that will be used for the flow at runtime.
component_concurrency		integer	No	Maximum number of concurrent Components to run within this Flow.

DataPlane

The external warehouse where data is persisted throughout the Flow runs, and where primary computation on the data itself occurs.

Property	Default	Type	Required	Description
connection_name		string	No
metadata_storage_location_prefix		string	No	Prefix to prepend to the names of metadata tables created for this Flow. The prefix may include database/project/etc. and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the Data Plane's Connection configuration.

RegexFilter

A filter used to target resources based on a regex pattern.

Property	Default	Type	Required	Description
regex		string	Yes	The regex to filter the resources.

RunnerConfig

Configuration for the flow runner

Property	Default	Type	Required	Description
size		RuntimeSize	No	Override the size of the flow runner. If not specified, the flow runner inherits the size from the deployment or workspace.

RuntimeSize

Enumeration of available runtime sizes for deployments, workspaces, and flow runners. Each size corresponds to specific resource allocations (CPU, memory, disk).

No properties defined.

BigQueryDataPlane

Property	Default	Type	Required	Description
bigquery			Yes	BigQuery configuration options.

BigQueryDataPlaneOptions

Property	Default	Type	Required	Description
partition_by		Any of:	No	Partition By clause for the table.
cluster_by		array[string]	No	Clustering keys to be added to the table.

BigQueryRangePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
range			Yes	Range partitioning options.

BigQueryTimePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
granularity		string ("DAY", "HOUR", "MONTH", "YEAR")	Yes	Granularity of the time partitioning.

DatabricksDataPlane

Property	Default	Type	Required	Description
databricks	cluster_by: null pyspark_job_cluster_id: null table_properties: null		No	Databricks configuration options.

DatabricksDataPlaneOptions

Property	Type	Required	Description
table_properties	object with property values of type string	No	Table properties to include when creating the data table. This setting is equivalent to the `CREATE TABLE ... TBLPROPERTIES` clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane.
pyspark_job_cluster_id	string	No	ID of the compute cluster to use for PySpark jobs.
cluster_by	array[string]	No	Clustering keys to be added to the table.

DuckdbDataPlane

Property	Default	Type	Required	Description
duckdb			No	DuckDB configuration options.

DuckDbDataPlaneOptions

No properties defined.

FabricDataPlane

Property	Default	Type	Required	Description
fabric	spark_session_config: null		No	Fabric configuration options.

FabricDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

RangeOptions

Property	Type	Required	Description
start	integer	Yes	Start of the range partitioning.
end	integer	Yes	End of the range partitioning.
interval	integer	Yes	Interval of the range partitioning.

SnowflakeDataPlane

Property	Default	Type	Required	Description
snowflake			Yes	Snowflake configuration options.

SnowflakeDataPlaneOptions

Property	Default	Type	Required	Description
cluster_by		array[string]	No	Clustering keys to be added to the table.

SynapseDataPlane

Property	Default	Type	Required	Description
synapse	spark_session_config: null		No	Synapse configuration options.

SynapseDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

LivySparkSessionConfig

Property	Type	Required	Description
pool	string	No	Pool to use for the Spark session.
driver_memory	string	No	Memory to use for the Spark driver.
driver_cores	integer	No	Number of cores to use for the Spark driver.
executor_memory	string	No	Memory to use for the Spark executor.
executor_cores	integer	No	Number of cores to use for each executor.
num_executors	integer	No	Number of executors to use for the Spark session.
session_key_override	string	No	Key to use for the Spark session.
max_concurrent_sessions	integer	No	Maximum number of concurrent sessions allowed for this configuration.

ResourceMetadata

Meta information of a resource. In most cases, it doesn't affect the system behavior but may be helpful to analyze Project resources.

Property	Default	Type	Required	Description
source			No	The origin or source information for the resource.
source_event_uuid		string	No	Event UUID associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

Property	Default	Type	Required	Description
path		string	Yes	Path within repository files where the resource is defined.
first_line_number		integer	No	First line number within `path` file where the resource is defined.

RetryStrategy

Retry strategy configuration for Component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

Property	Default	Type	Required	Description
stop_after_attempt		integer	No	Number of retry attempts before giving up. If set to None, it will not stop after any number of attempts.
stop_after_delay		integer	No	Maximum time (in seconds) to spend on retries before giving up. If set to None, it will not stop after any time delay.

BackfillRun​

Property Details​

BackfillRunOptions​

FlowRunBaseOptions​

ConfigFilter​

ComponentSpec​

FlowSpec​

DataPlane​

RegexFilter​

RunnerConfig​

RuntimeSize​

BigQueryDataPlane​

BigQueryDataPlaneOptions​

BigQueryRangePartitioning​

BigQueryTimePartitioning​

DatabricksDataPlane​

DatabricksDataPlaneOptions​

DuckdbDataPlane​

DuckDbDataPlaneOptions​

FabricDataPlane​

FabricDataPlaneOptions​

RangeOptions​

SnowflakeDataPlane​

SnowflakeDataPlaneOptions​

SynapseDataPlane​

SynapseDataPlaneOptions​

LivySparkSessionConfig​

ResourceMetadata​

ResourceLocation​

RetryStrategy​