Defines the parameters for a backfill run.
BackfillRun​
Below are the properties for the BackfillRun
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|
backfill_run | | | Yes | Backfill run options. |
Property Details​
BackfillRunOptions​
Options for a backfill run.
Property | Default | Type | Required | Description |
---|
description | | string
| No | A brief description of what the model does. |
metadata | | | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. |
name | | string | Yes | The name of the model |
flow_name | | string | Yes | The name of the flow that is to be backfilled. |
start_time | | string | Yes | Start time of the time range to be backfilled. |
end_time | | string | Yes | End time of the time range to be backfilled. |
granularity | | string ("day", "week", "month") | Yes | The time granularity to use for backfill. Must be one of: 'day', 'week', 'month'. The backfill runner will divide the date range into flow runs of this granularity and launch these flow runs. |
max_concurrent_flow_runs | 1 | integer | No | The maximum number of concurrent flow runs used for backfill. This is used to limit the number of flow runners (and hence cluster resources) that are launched at once. |
backfill_order | | string ("forward_chronological", "reverse_chronological")
| No | The order to use for backfilling - either forward or reverse chronological order. |
flow_run_options | | | No | Additional options for each flow run launched during the backfill. |
run_final_sync | | boolean
| No | A boolean flag indicating whether to run a final sync after the running concurrent backfill flow runs. This final sync is a single flow run that is executed without any time parameters, and is meant to sync the data to the latest state and capture any missing time intervals. |
FlowRunBaseOptions​
Base options for a Flow Run
Property | Default | Type | Required | Description |
---|
parameters | | object with property values of type None
| No | Dictionary of parameters to use for resource. |
defaults | | array[None]
| No | List of default configs with filters that can be applied to a resource config. |
description | | string
| No | A brief description of what the model does. |
metadata | | | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. |
run_tests | True | boolean | No | A boolean flag indicating whether to run tests after processing the data. |
store_test_results | | boolean
| No | A boolean flag indicating whether to store the test results. |
components | | array[string]
| No | List of component names to run. |
component_categories | | array[string]
| No | List of component categories to run. |
halt_flow_on_error | | boolean
| No | A boolean flag indicating whether to halt the flow on error. |
disable_optimizers | | boolean
| No | A boolean flag indicating whether to disable optimizers. |
disable_incremental_metadata_collection | | boolean
| No | A boolean flag indicating whether to disable collection incremental RC/Transform metadata. |
full_refresh | False | boolean
| No | A boolean flag indicating whether to perform a full refresh of each component. âš If true, will drop all internal data and metadata tables/views and re-compute from scratch. |
update_materialization_type | False | boolean
| No | A boolean flag indicating whether to update component materialization types (e.g., changing types between 'simple', 'view', 'incremental', and 'smart'). âš If materialization type changes are detected, existing data and metadata tables/views will be dropped and re-computed from scratch. Otherwise, existing data and metadata tables/views will be preserved, and type changes will result in an error. |
ConfigFilter​
A filter used to target configuration settings to a specific flow and/or component.
Property | Default | Type | Required | Description |
---|
kind | | string ("Flow", "Component") | Yes | The kind of the resource to apply the config to. |
name | | Any of:   string   array[string]   array[None]
| Yes | Name of the resource to apply the config to. |
flow_name | | string
| No | Name of the flow to apply the config to. |
spec | | Any of:
| No | Dictionary of parameters to use for the resource. |
ComponentSpec​
Specification for configuration applied to a component at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|
skip | | boolean
| No | A boolean flag indicating whether to skip processing for the component or not. |
retry_strategy | | | No | The retry strategy configuration options for the component if any exceptions are encountered. |
data_plane | |   One of:     SnowflakeDataPlane     BigQueryDataPlane     DatabricksDataPlane
| No | Data Plane-specific configuration options for a component. |
FlowSpec​
Specification for configuration applied to a flow at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|
data_plane | | | No | The data plane that will be used for the flow at runtime. |
DataPlane​
The external warehouse where data is persisted throughout the flow runs, and where primary computation on the data itself occurs.
Property | Default | Type | Required | Description |
---|
connection_name | | string
| No | |
metadata_storage_location_prefix | | string
| No | Prefix to prepend to the names of metadata tables created for this flow. The prefix may include database/project/etc and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the data plane's connection configuration. |
RegexFilter​
A filter used to target resources based on a regex pattern.
Property | Default | Type | Required | Description |
---|
regex | | string | Yes | The regex to filter the resources. |
BigQueryDataPlane​
Property | Default | Type | Required | Description |
---|
bigquery | | | Yes | BigQuery configuration options. |
BigQueryDataPlaneOptions​
Property | Default | Type | Required | Description |
---|
partition_by | | Any of:
| No | Partition By clause for the table. |
cluster_by | | array[string]
| No | Clustering keys to be added to the table. |
BigQueryRangePartitioning​
Property | Default | Type | Required | Description |
---|
field | | string | Yes | Field to partition by. |
range | | | Yes | Range partitioning options. |
BigQueryTimePartitioning​
Property | Default | Type | Required | Description |
---|
field | | string | Yes | Field to partition by. |
granularity | | string ("DAY", "HOUR", "MONTH", "YEAR") | Yes | Granularity of the time partitioning. |
DatabricksDataPlane​
Property | Default | Type | Required | Description |
---|
databricks | cluster_by: null pyspark_job_cluster_id: null table_properties: null
| | No | Databricks configuration options. |
DatabricksDataPlaneOptions​
Property | Default | Type | Required | Description |
---|
table_properties | | object with property values of type string
| No | Table properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your data plane. |
pyspark_job_cluster_id | | string
| No | The ID of the compute cluster to use for PySpark jobs. |
cluster_by | | array[string]
| No | Clustering keys to be added to the table. |
DuckdbDataPlane​
Property | Default | Type | Required | Description |
---|
duckdb |
| | No | Duckdb configuration options. |
DuckDbDataPlaneOptions​
No properties defined.
FabricDataPlane​
Property | Default | Type | Required | Description |
---|
fabric | spark_session_config: null
| | No | Fabric configuration options. |
FabricDataPlaneOptions​
Property | Default | Type | Required | Description |
---|
spark_session_config | | | No | Spark session configuration. |
RangeOptions​
Property | Default | Type | Required | Description |
---|
start | | integer | Yes | Start of the range partitioning. |
end | | integer | Yes | End of the range partitioning. |
interval | | integer | Yes | Interval of the range partitioning. |
SnowflakeDataPlane​
Property | Default | Type | Required | Description |
---|
snowflake | | | Yes | Snowflake configuration options. |
SnowflakeDataPlaneOptions​
Property | Default | Type | Required | Description |
---|
cluster_by | | array[string]
| No | Clustering keys to be added to the table. |
SynapseDataPlane​
Property | Default | Type | Required | Description |
---|
synapse | spark_session_config: null
| | No | Synapse configuration options. |
SynapseDataPlaneOptions​
Property | Default | Type | Required | Description |
---|
spark_session_config | | | No | Spark session configuration. |
LivySparkSessionConfig​
Property | Default | Type | Required | Description |
---|
pool | | string
| No | The pool to use for the Spark session. |
driver_memory | | string
| No | The memory to use for the Spark driver. |
driver_cores | | integer
| No | The number of cores to use for the Spark driver. |
executor_memory | | string
| No | The memory to use for the Spark executor. |
executor_cores | | integer
| No | The number of cores to use for each executor. |
num_executors | | integer
| No | The number of executors to use for the Spark session. |
session_key_override | | string
| No | The key to use for the Spark session. |
max_concurrent_sessions | | integer
| No | The maximum number of concurrent sessions of this spec to create. |
Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
Property | Default | Type | Required | Description |
---|
source | | | No | The origin or source information for the resource. |
source_event_uuid | | string
| No | UUID of the event that is associated with creation of this resource. |
ResourceLocation​
The origin or source information for the resource.
Property | Default | Type | Required | Description |
---|
path | | string | Yes | Path within repository files where the resource is defined. |
first_line_number | | integer
| No | First line number within path file where the resource is defined. |
RetryStrategy​
Retry strategy configuration for component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.
Property | Default | Type | Required | Description |
---|
stop_after_attempt | | integer
| No | The number of attempts before giving up, if None is set, will not stop after any attempts. |
stop_after_delay | | integer
| No | Give up on retries one attempt before you would exceed the delay, if None is set, will not stop after any attempts. |