Skip to main content
Version: 3.0.0

Backfill Run

Defines the parameters for a backfill run.

BackfillRun

Below are the properties for the BackfillRun. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
backfill_runBackfillRunOptionsYesBackfill run options.

Property Details

BackfillRunOptions

Options for a backfill run.

PropertyDefaultTypeRequiredDescription
namestring
NoThe name of the model
descriptionstring
NoA brief description of what the model does.
metadataResourceMetadata
NoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
flow_namestringYesThe name of the flow that is to be backfilled.
start_timestringYesStart time of the time range to be backfilled.
end_timestringYesEnd time of the time range to be backfilled.
granularitystring ("day", "week", "month")YesThe time granularity to use for backfill. Must be one of: 'day', 'week', 'month'. The backfill runner will divide the date range into flow runs of this granularity and launch these flow runs.
max_concurrent_flow_runsinteger
NoThe maximum number of concurrent flow runs used for backfill. This is used to limit the number of flow runners (and hence cluster resources) that are launched at once.
backfill_orderstring ("forward_chronological", "reverse_chronological")
NoThe order to use for backfilling - either forward or reverse chronological order.
flow_run_optionsFlowRunBaseOptions
NoAdditional options for each flow run launched during the backfill.
run_final_syncboolean
NoA boolean flag indicating whether to run a final sync after the running concurrent backfill flow runs. This final sync is a single flow run that is executed without any time parameters, and is meant to sync the data to the latest state and capture any missing time intervals.

FlowRunBaseOptions

Base options for a Flow Run

PropertyDefaultTypeRequiredDescription
namestring
NoThe name of the model
descriptionstring
NoA brief description of what the model does.
metadataResourceMetadata
NoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
parametersobject
NoDictionary of parameters to use for resource.
defaultsarray[ConfigFilter]
NoList of default configs with filters that can be applied to a resource config.
run_testsTruebooleanNoA boolean flag indicating whether to run tests after processing the data.
store_test_resultsboolean
NoA boolean flag indicating whether to store the test results.
componentsarray[string]
NoList of component names to run.
component_categoriesarray[string]
NoList of component categories to run.
halt_flow_on_errorboolean
NoA boolean flag indicating whether to halt the flow on error.
disable_optimizersboolean
NoA boolean flag indicating whether to disable optimizers.
disable_incremental_metadata_collectionboolean
NoA boolean flag indicating whether to disable collection incremental RC/Transform metadata.

ConfigFilter

A filter used to target configuration settings to a specific flow and/or component.

PropertyDefaultTypeRequiredDescription
kindstring ("Flow", "Component")YesThe kind of the resource to apply the config to.
nameAny of:
  string
  array[string]
  RegexFilter
  array[RegexFilter]
YesName of the resource to apply the config to.
flow_namestring
NoName of the flow to apply the config to.
specAny of:
  FlowSpec
  ComponentSpec
NoDictionary of parameters to use for the resource.

ComponentSpec

Specification for configuration applied to a component at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    SynapseDataPlane
NoData Plane-specific configuration options for a component.
skipFalsebooleanNo

FlowSpec

Specification for configuration applied to a flow at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
data_planeDataPlaneNoThe data plane that will be used for the flow at runtime.

DataPlane

The external warehouse where data is persisted throughout the flow runs, and where primary computation on the data itself occurs.

PropertyDefaultTypeRequiredDescription
connection_namestring
No
metadata_storage_location_prefixstring
NoPrefix to prepend to the names of metadata tables created for this flow. The prefix may include database/project/etc and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the data plane's connection configuration.

RegexFilter

A filter used to target resources based on a regex pattern.

PropertyDefaultTypeRequiredDescription
regexstringYesThe regex to filter the resources.

BigQueryDataPlane

PropertyDefaultTypeRequiredDescription
bigqueryBigQueryDataPlaneOptionsYesBigQuery configuration options.

BigQueryDataPlaneOptions

PropertyDefaultTypeRequiredDescription
partition_byAny of:
  BigQueryRangePartitioning
  BigQueryTimePartitioning
NoPartition By clause for the table.
cluster_byarray[string]
NoClustering keys to be added to the table.

BigQueryRangePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
rangeRangeOptionsYesRange partitioning options.

BigQueryTimePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
granularitystring ("DAY", "HOUR", "MONTH", "YEAR")YesGranularity of the time partitioning.

DuckdbDataPlane

PropertyDefaultTypeRequiredDescription
duckdb
DuckDbDataPlaneOptionsNoDuckdb configuration options.

DuckDbDataPlaneOptions

No properties defined.

RangeOptions

PropertyDefaultTypeRequiredDescription
startintegerYesStart of the range partitioning.
endintegerYesEnd of the range partitioning.
intervalintegerYesInterval of the range partitioning.

SnowflakeDataPlane

PropertyDefaultTypeRequiredDescription
snowflakeSnowflakeDataPlaneOptionsYesSnowflake configuration options.

SnowflakeDataPlaneOptions

PropertyDefaultTypeRequiredDescription
cluster_byarray[string]
NoClustering keys to be added to the table.

SynapseDataPlane

PropertyDefaultTypeRequiredDescription
synapsespark_session_config: null
SynapseDataPlaneOptionsNoSynapse configuration options.

SynapseDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configLivySparkSessionConfig
NoSpark session configuration.

LivySparkSessionConfig

PropertyDefaultTypeRequiredDescription
poolstring
NoThe pool to use for the Spark session.
driver_memorystring
NoThe memory to use for the Spark driver.
driver_coresinteger
NoThe number of cores to use for the Spark driver.
executor_memorystring
NoThe memory to use for the Spark executor.
executor_coresinteger
NoThe number of cores to use for each executor.
num_executorsinteger
NoThe number of executors to use for the Spark session.
session_key_overridestring
NoThe key to use for the Spark session.

ResourceMetadata

Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.

PropertyDefaultTypeRequiredDescription
sourceResourceLocation
NoThe origin or source information for the resource.
source_event_uuidstring
NoUUID of the event that is associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

PropertyDefaultTypeRequiredDescription
pathstringYesPath within repository files where the resource is defined.
first_line_numberinteger
NoFirst line number within path file where the resource is defined.