Incremental Read Strategy

Incremental Read Strategy for database read components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.

IncrementalReadStrategy

info

IncrementalReadStrategy is defined beneath the following ancestor nodes in the YAML structure:

Component
ReadComponent
BigQueryReadComponent
DatabricksReadComponent
MSSQLReadComponent
MySQLReadComponent
OracleReadComponent
PostgresReadComponent
SnowflakeReadComponent

Below are the properties for the IncrementalReadStrategy. Each property links to the specific details section further down in this page.

Property	Type	Required	Description
replication	One of: Any of: cdc Any of: incremental	No	Replication strategy to use for data synchronization.
incremental	Any of: append MergeStrategy	Yes	Incremental processing strategy.
on_schema_change	string ("ignore", "fail", "append_new_columns", "sync_all_columns")	No	Policy to apply when schema changes are detected. Defaults to 'fail' if not provided.

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

Property	Default	Type	Required	Description
component		One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent	Yes	Configuration options for the component.

ReadComponent

A component that reads data from a data system.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for a component.
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
description	string	No	A brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	The name of the flow that the component belongs to.
data_maintenance		No	The data maintenance configuration options for the component.
tests		No	Defines tests to run on the data of this component.
read	One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent	Yes	The read component that reads data from a data system.

BigQueryReadComponent

A component that reads data from a BigQuery table.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
bigquery	Any of:	Yes

DatabricksReadComponent

A component that reads data from a Databricks table.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
databricks	Any of:	Yes

MSSQLReadComponent

A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
mssql	Any of:	Yes

MySQLReadComponent

A component that reads data from a MySQL database, options include ingesting a single table / query, or multiple tables / queries.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
use_duckdb	boolean	No	Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
mysql	Any of:	Yes	MySQL read options.
use_checksum	boolean	No	Use table checksum to detect data changes. if false or unset, will do full re-read for every run for full-sync.

OracleReadComponent

A component that reads data from an Oracle table.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
connection		string	No	The name of the connection to use for reading data.
columns		array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize		boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case		boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase		boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy		Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options			No	Options for reading from the database or warehouse.
oracle	Oracle	Any of:	No	Oracle read options.

PostgresReadComponent

A component that reads data from a Postgresql table.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
connection		string	No	The name of the connection to use for reading data.
columns		array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize		boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case		boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase		boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy		Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options			No	Options for reading from the database or warehouse.
use_duckdb		boolean	No	Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
postgres	Postgres	Any of:	No	Postgres read options.

SnowflakeReadComponent

A component that reads data from a Snowflake table.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
snowflake	Any of:	Yes

CdcReplication

Specifies if Change Data Capture (CDC) is the replication strategy.

Property	Default	Type	Required	Description
cdc			No	Resource for Change Data Capture (CDC), enabling incremental data capture based on changes.

CdcOptions

No properties defined.

IncrementalReplication

Specifies if incremental data reading is the replication strategy.

Property	Default	Type	Required	Description
incremental			No	Resource for incremental data reading based on a specific column.

IncrementalColumn

Specifies the column to be used for incremental reading.

Property	Default	Type	Required	Description
column_name		string	Yes	Name of the column to use for tracking incremental updates to the data.
start_value		Any of: string integer number string	No	Initial value to start reading data from the specified column.

SCDType2Strategy

The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.

Property	Default	Type	Required	Description
scd_type_2			No	Options for SCD Type 2 strategy.

MergeStrategy

A strategy that involves merging new data with existing data by updating existing records that match the unique key.

Property	Default	Type	Required	Description
merge			No	Options for merge strategy.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

Property	Type	Required	Description
unique_key	string	Yes	Column or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_column	string	No	Column name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columns	Any of: string array[string]	No	List of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in `merge_exclude_columns`.
merge_exclude_columns	Any of: string array[string]	No	List of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in `merge_update_columns`.
incremental_predicates	Any of: string array[string]	No	List of conditions to filter incremental data.