Backfill Run
Defines the parameters for a backfill run.
BackfillRun​
Below are the properties for the BackfillRun
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
backfill_run | BackfillRunOptions | Yes | Backfill run options. |
Property Details​
BackfillRunOptions​
Options for a backfill run.
Property | Default | Type | Required | Description |
---|---|---|---|---|
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
name | string | Yes | The name of the model | |
flow_name | string | Yes | The name of the flow that is to be backfilled. | |
start_time | string | Yes | Start time of the time range to be backfilled. | |
end_time | string | Yes | End time of the time range to be backfilled. | |
granularity | string ("day", "week", "month") | Yes | The time granularity to use for backfill. Must be one of: 'day', 'week', 'month'. The backfill runner will divide the date range into flow runs of this granularity and launch these flow runs. | |
max_concurrent_flow_runs | 1 | integer | No | The maximum number of concurrent flow runs used for backfill. This is used to limit the number of flow runners (and hence cluster resources) that are launched at once. |
backfill_order | string ("forward_chronological", "reverse_chronological") | No | The order to use for backfilling - either forward or reverse chronological order. | |
flow_run_options | FlowRunBaseOptions | No | Additional options for each flow run launched during the backfill. | |
run_final_sync | boolean | No | A boolean flag indicating whether to run a final sync after the running concurrent backfill flow runs. This final sync is a single flow run that is executed without any time parameters, and is meant to sync the data to the latest state and capture any missing time intervals. |
FlowRunBaseOptions​
Base options for a Flow Run
Property | Default | Type | Required | Description |
---|---|---|---|---|
parameters | object with property values of type None | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
run_tests | True | boolean | No | A boolean flag indicating whether to run tests after processing the data. |
store_test_results | boolean | No | A boolean flag indicating whether to store the test results. | |
components | array[string] | No | List of component names to run. | |
component_categories | array[string] | No | List of component categories to run. | |
halt_flow_on_error | boolean | No | A boolean flag indicating whether to halt the flow on error. | |
disable_optimizers | boolean | No | A boolean flag indicating whether to disable optimizers. | |
disable_incremental_metadata_collection | boolean | No | A boolean flag indicating whether to disable collection incremental RC/Transform metadata. | |
full_refresh | False | boolean | No | A boolean flag indicating whether to perform a full refresh of each component. âš If true, will drop all internal data and metadata tables/views and re-compute from scratch. |
update_materialization_type | False | boolean | No | A boolean flag indicating whether to update component materialization types (e.g., changing types between 'simple', 'view', 'incremental', and 'smart'). âš If materialization type changes are detected, existing data and metadata tables/views will be dropped and re-computed from scratch. Otherwise, existing data and metadata tables/views will be preserved, and type changes will result in an error. |
ConfigFilter​
A filter used to target configuration settings to a specific flow and/or component.
Property | Default | Type | Required | Description |
---|---|---|---|---|
kind | string ("Flow", "Component") | Yes | The kind of the resource to apply the config to. | |
name | Any of:   string   array[string]   RegexFilter   array[RegexFilter] | Yes | Name of the resource to apply the config to. | |
flow_name | string | No | Name of the flow to apply the config to. | |
spec | Any of:   FlowSpec   ComponentSpec | No | Dictionary of parameters to use for the resource. |
ComponentSpec​
Specification for configuration applied to a component at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane |   One of:     SnowflakeDataPlane     BigQueryDataPlane     DuckdbDataPlane     SynapseDataPlane     FabricDataPlane     DatabricksDataPlane | No | Data Plane-specific configuration options for a component. | |
skip | False | boolean | No |
FlowSpec​
Specification for configuration applied to a flow at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | DataPlane | No | The data plane that will be used for the flow at runtime. |
DataPlane​
The external warehouse where data is persisted throughout the flow runs, and where primary computation on the data itself occurs.
Property | Default | Type | Required | Description |
---|---|---|---|---|
connection_name | string | No | ||
metadata_storage_location_prefix | string | No | Prefix to prepend to the names of metadata tables created for this flow. The prefix may include database/project/etc and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the data plane's connection configuration. |
RegexFilter​
A filter used to target resources based on a regex pattern.
Property | Default | Type | Required | Description |
---|---|---|---|---|
regex | string | Yes | The regex to filter the resources. |
BigQueryDataPlane​
Property | Default | Type | Required | Description |
---|---|---|---|---|
bigquery | BigQueryDataPlaneOptions | Yes | BigQuery configuration options. |
BigQueryDataPlaneOptions​
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | Any of:   BigQueryRangePartitioning   BigQueryTimePartitioning | No | Partition By clause for the table. | |
cluster_by | array[string] | No | Clustering keys to be added to the table. |
BigQueryRangePartitioning​
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
range | RangeOptions | Yes | Range partitioning options. |
BigQueryTimePartitioning​
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
granularity | string ("DAY", "HOUR", "MONTH", "YEAR") | Yes | Granularity of the time partitioning. |
DatabricksDataPlane​
Property | Default | Type | Required | Description |
---|---|---|---|---|
databricks | cluster_by: null pyspark_job_cluster_id: null table_properties: null | DatabricksDataPlaneOptions | No | Databricks configuration options. |
DatabricksDataPlaneOptions​
Property | Default | Type | Required | Description |
---|---|---|---|---|
table_properties | object with property values of type string | No | Table properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your data plane. | |
pyspark_job_cluster_id | string | No | The ID of the compute cluster to use for PySpark jobs. | |
cluster_by | array[string] | No | Clustering keys to be added to the table. |
DuckdbDataPlane​
Property | Default | Type | Required | Description |
---|---|---|---|---|
duckdb | DuckDbDataPlaneOptions | No | Duckdb configuration options. |
DuckDbDataPlaneOptions​
No properties defined.
FabricDataPlane​
Property | Default | Type | Required | Description |
---|---|---|---|---|
fabric | spark_session_config: null | FabricDataPlaneOptions | No | Fabric configuration options. |
FabricDataPlaneOptions​
Property | Default | Type | Required | Description |
---|---|---|---|---|
spark_session_config | LivySparkSessionConfig | No | Spark session configuration. |
RangeOptions​
Property | Default | Type | Required | Description |
---|---|---|---|---|
start | integer | Yes | Start of the range partitioning. | |
end | integer | Yes | End of the range partitioning. | |
interval | integer | Yes | Interval of the range partitioning. |
SnowflakeDataPlane​
Property | Default | Type | Required | Description |
---|---|---|---|---|
snowflake | SnowflakeDataPlaneOptions | Yes | Snowflake configuration options. |
SnowflakeDataPlaneOptions​
Property | Default | Type | Required | Description |
---|---|---|---|---|
cluster_by | array[string] | No | Clustering keys to be added to the table. |
SynapseDataPlane​
Property | Default | Type | Required | Description |
---|---|---|---|---|
synapse | spark_session_config: null | SynapseDataPlaneOptions | No | Synapse configuration options. |
SynapseDataPlaneOptions​
Property | Default | Type | Required | Description |
---|---|---|---|---|
spark_session_config | LivySparkSessionConfig | No | Spark session configuration. |
LivySparkSessionConfig​
Property | Default | Type | Required | Description |
---|---|---|---|---|
pool | string | No | The pool to use for the Spark session. | |
driver_memory | string | No | The memory to use for the Spark driver. | |
driver_cores | integer | No | The number of cores to use for the Spark driver. | |
executor_memory | string | No | The memory to use for the Spark executor. | |
executor_cores | integer | No | The number of cores to use for each executor. | |
num_executors | integer | No | The number of executors to use for the Spark session. | |
session_key_override | string | No | The key to use for the Spark session. | |
max_concurrent_sessions | integer | No | The maximum number of concurrent sessions of this spec to create. |
ResourceMetadata​
Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
Property | Default | Type | Required | Description |
---|---|---|---|---|
source | ResourceLocation | No | The origin or source information for the resource. | |
source_event_uuid | string | No | UUID of the event that is associated with creation of this resource. |
ResourceLocation​
The origin or source information for the resource.
Property | Default | Type | Required | Description |
---|---|---|---|---|
path | string | Yes | Path within repository files where the resource is defined. | |
first_line_number | integer | No | First line number within path file where the resource is defined. |