Incremental Strategy

Incremental Processing Strategy.

IncrementalStrategy

info

IncrementalStrategy is defined beneath the following ancestor nodes in the YAML structure:

Component
CustomPythonReadComponent
CustomPythonReadOptions
TransformComponent
PySparkTransform
PythonTransform
SnowparkTransform
SqlTransform

Below are the properties for the IncrementalStrategy. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
incremental		Any of: append MergeStrategy	Yes	Incremental processing strategy.
on_schema_change		string ("ignore", "fail", "append_new_columns", "sync_all_columns", "smart")	No	Policy to apply when schema changes are detected. Defaults to 'fail' if not provided.

Property Details

Component

A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.

Property	Default	Type	Required	Description
component		One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent	Yes	Component configuration options.

CustomPythonReadComponent

Component that reads data using user-defined custom Python code.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for Components.
skip	boolean	No	Boolean flag indicating whether to skip processing for the Component or not.
retry_strategy		No	Retry strategy configuration options for the Component if any exceptions are encountered.
data_maintenance		No	The data maintenance configuration options for the Component.
description	string	No	Brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	Name of the Flow that the Component belongs to.
tests		No	Defines tests to run on this Component's data.
custom_python_read		Yes

CustomPythonReadOptions

Configuration options for the Custom Python Read Component.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this Component runs.
event_time		string	No	Timestamp column in the Component output used to represent Event time.
strategy	full	Any of: full IncrementalStrategy PartitionedStrategy	No	Ingest strategy.
python		Any of:	Yes	Python code to execute for ingesting data.

TransformComponent

Component that executes SQL or Python code to transform data.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for Components.
skip	boolean	No	Boolean flag indicating whether to skip processing for the Component or not.
retry_strategy		No	Retry strategy configuration options for the Component if any exceptions are encountered.
data_maintenance		No	The data maintenance configuration options for the Component.
description	string	No	Brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	Name of the Flow that the Component belongs to.
tests		No	Defines tests to run on this Component's data.
transform	One of: SqlTransform PythonTransform SnowparkTransform PySparkTransform	Yes	Transform that executes SQL or Python code for data transformation.

PySparkTransform

PySpark Transforms execute PySpark code to transform data.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this Component runs.
event_time		string	No	Timestamp column in the Component output used to represent Event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	Size/time granularity of the microbatch to process.
lookback	1	integer	No	Number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	'Beginning of time' for this Component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as Transform data sources.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy: either incremental, partitioned, or view/table.
pyspark			No	PySpark function to execute for data transformation.

PythonTransform

Python Transform.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this Component runs.
event_time		string	No	Timestamp column in the Component output used to represent Event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	Size/time granularity of the microbatch to process.
lookback	1	integer	No	Number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	'Beginning of time' for this Component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as Transform data sources.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy: either incremental, partitioned, or view/table.
python			No	Python function to execute for data transformation.

SnowparkTransform

Snowpark Transforms execute Python code to transform data within the Snowflake platform.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this Component runs.
event_time		string	No	Timestamp column in the Component output used to represent Event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	Size/time granularity of the microbatch to process.
lookback	1	integer	No	Number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	'Beginning of time' for this Component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as Transform data sources.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy: either incremental, partitioned, or view/table.
snowpark			No	Snowpark function to execute for data transformation.

SqlTransform

SQL Transform.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this Component runs.
event_time		string	No	Timestamp column in the Component output used to represent Event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	Size/time granularity of the microbatch to process.
lookback	1	integer	No	Number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	'Beginning of time' for this Component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as Transform data sources.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy: either incremental, partitioned, or view/table.
sql		string	No	SQL query to execute for data transformation.
dialect		spark	No	SQL dialect to use for the query. Set to 'None' for the Data Plane's default dialect, or 'spark' for Spark SQL.

SCDType2Strategy

The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.

Property	Default	Type	Required	Description
scd_type_2			No	Options for SCD Type 2 strategy.

MergeStrategy

Strategy that involves merging new data with existing data by updating existing records that match the unique key.

Property	Default	Type	Required	Description
merge			No	Options for merge strategy.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

Property	Type	Required	Description
unique_key	string	Yes	Column or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_column	string	No	Column name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columns	Any of: string array[string]	No	List of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in `merge_exclude_columns`.
merge_exclude_columns	Any of: string array[string]	No	List of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in `merge_update_columns`.
incremental_predicates	Any of: string array[string]	No	List of conditions to filter incremental data.