Skip to main content
Version: 3.0.0

Merge Strategy

A strategy that involves merging new data with existing data by updating existing records that match the unique key.

MergeStrategy

Below are the properties for the MergeStrategy. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
mergeNoOptions for merge strategy.

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

PropertyDefaultTypeRequiredDescription
componentOne of:
  CustomPythonReadComponent
  ApplicationComponent
  AliasedTableComponent
  ExternalTableComponent
YesConfiguration options for the component.

CustomPythonReadComponent

A component that reads data using user-defined custom Python code.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
retry_strategyNoThe retry strategy configuration options for the component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
data_maintenanceNoThe data maintenance configuration options for the component.
testsNoDefines tests to run on the data of this component.
custom_python_readYes

CustomPythonReadOptions

Configuration options for the Custom Python Read component.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
strategyfullAny of:
  full
  IncrementalStrategy
  PartitionedStrategy
NoIngest strategy.
pythonAny of:
YesPython code to execute for ingesting data.

ReadComponent

A component that reads data from a data system.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
retry_strategyNoThe retry strategy configuration options for the component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
data_maintenanceNoThe data maintenance configuration options for the component.
testsNoDefines tests to run on the data of this component.
readOne of:
  GenericFileReadComponent
  LocalFileReadComponent
  SFTPReadComponent
  S3ReadComponent
  GcsReadComponent
  AbfsReadComponent
  HttpReadComponent
  MSSQLReadComponent
  MySQLReadComponent
  OracleReadComponent
  PostgresReadComponent
  SnowflakeReadComponent
  BigQueryReadComponent
  DatabricksReadComponent
YesThe read component that reads data from a data system.

BigQueryReadComponent

A component that reads data from a BigQuery table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
bigqueryAny of:
Yes

DatabricksReadComponent

A component that reads data from a Databricks table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
databricksAny of:
Yes

MSSQLReadComponent

A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
mssqlAny of:
Yes

MySQLReadComponent

A component that reads data from a MySQL database, options include ingesting a single table / query, or multiple tables / queries.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
use_duckdbboolean
NoUse DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
mysqlAny of:
YesMySQL read options.
use_checksumboolean
NoUse table checksum to detect data changes. if false or unset, will do full re-read for every run for full-sync.

OracleReadComponent

A component that reads data from an Oracle table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
oracleOracleAny of:
NoOracle read options.

PostgresReadComponent

A component that reads data from a Postgresql table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
use_duckdbboolean
NoUse DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
postgresPostgresAny of:
NoPostgres read options.

SnowflakeReadComponent

A component that reads data from a Snowflake table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoThe name of the connection to use for reading data.
columnsarray[None]
NoA list specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved after reading.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
snowflakeAny of:
Yes

IncrementalReadStrategy

Incremental Read Strategy for database read components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.

PropertyDefaultTypeRequiredDescription
replication  One of:
    Any of:
      cdc
    Any of:
      incremental
NoReplication strategy to use for data synchronization.
incrementalAny of:
  append
  MergeStrategy
YesIncremental processing strategy.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

TransformComponent

A component that executes SQL or Python code to transform data.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
retry_strategyNoThe retry strategy configuration options for the component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
data_maintenanceNoThe data maintenance configuration options for the component.
testsNoDefines tests to run on the data of this component.
transformOne of:
  SqlTransform
  PythonTransform
  SnowparkTransform
  PySparkTransform
YesThe transform component that executes SQL or Python code to transform data.

PySparkTransform

PySpark transforms execute PySpark code to transform data.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as data sources for the transform.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy - incremental, partitioned, or view/table.
pysparkNoPySpark transform function to execute for transforming the data.

PythonTransform

Python transforms execute Python code to transform data.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as data sources for the transform.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy - incremental, partitioned, or view/table.
pythonNoPython transform function to execute for transforming the data.

SnowparkTransform

Snowpark transforms execute Python code to transform data within the Snowflake platform.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as data sources for the transform.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy - incremental, partitioned, or view/table.
snowparkNoSnowpark transform function to execute for transforming the data.

SqlTransform

SQL transforms execute SQL queries to transform data.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as data sources for the transform.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy - incremental, partitioned, or view/table.
sqlstring
NoSQL query to execute for transforming the data.
dialectspark
NoSQL dialect to use for the query. Set to 'None' for the data plane's default dialect, or 'spark' for Spark SQL.

IncrementalStrategy

Incremental Processing Strategy.

PropertyDefaultTypeRequiredDescription
incrementalAny of:
  append
  MergeStrategy
YesIncremental processing strategy.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

WriteComponent

PropertyDefaultTypeRequiredDescription
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
retry_strategyNoThe retry strategy configuration options for the component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
writeOne of:
  BigQueryWriteComponent
  SnowflakeWriteComponent
  S3WriteComponent
  SFTPWriteComponent
  GcsWriteComponent
  AbfsWriteComponent
  MySQLWriteComponent
  OracleWriteComponent
Yes

BigQueryWriteComponent

A component that writes data to a BigQuery table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
connectionstringYesThe name of the connection to use for writing data.
inputYesInput component name.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention when writing.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved when writing.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase when writing.
strategyfull:
  mode: drop_and_recreate
Any of:
  snapshot
NoResource for write strategy.
pre_sqlAny of:
  string
  array[string]
NoSQL statements to execute before the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
post_sqlAny of:
  string
  array[string]
NoSQL statements to execute after the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
bigqueryYes

MySQLWriteComponent

A component that writes data to a MySQL table

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
connectionstringYesThe name of the connection to use for writing data.
inputYesInput component name.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention when writing.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved when writing.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase when writing.
strategyfull:
  mode: drop_and_recreate
Any of:
  snapshot
NoResource for write strategy.
pre_sqlAny of:
  string
  array[string]
NoSQL statements to execute before the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
post_sqlAny of:
  string
  array[string]
NoSQL statements to execute after the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
mysqlYes

OracleWriteComponent

A component that writes data to a Oracle table

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
connectionstringYesThe name of the connection to use for writing data.
inputYesInput component name.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention when writing.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved when writing.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase when writing.
strategyfull:
  mode: drop_and_recreate
Any of:
  snapshot
NoResource for write strategy.
pre_sqlAny of:
  string
  array[string]
NoSQL statements to execute before the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
post_sqlAny of:
  string
  array[string]
NoSQL statements to execute after the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
oracleYes

SnowflakeWriteComponent

A component that writes data to a Snowflake table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
connectionstringYesThe name of the connection to use for writing data.
inputYesInput component name.
normalizeboolean
NoA boolean flag indicating if the output column names should be normalized to a standard naming convention when writing.
preserve_caseboolean
NoA boolean flag indicating if the case of the column names should be preserved when writing.
uppercaseboolean
NoA boolean flag indicating if the column names should be transformed to uppercase when writing.
strategyfull:
  mode: drop_and_recreate
Any of:
  snapshot
NoResource for write strategy.
pre_sqlAny of:
  string
  array[string]
NoSQL statements to execute before the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
post_sqlAny of:
  string
  array[string]
NoSQL statements to execute after the main write operation. Can be a single SQL statement as a string or multiple statements as a list of strings.
snowflakeYes

IncrementalWriteStrategyWithSchemaChange

Incremental write strategy that defines how new data is merged with existing data, along with policy for handling schema changes.

PropertyDefaultTypeRequiredDescription
incrementalAny of:
  append
  MergeStrategy
YesOptions to use when incrementally writing data to a Write component.
incremental_columnstringYesName of the column to use for tracking incremental updates to the data.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

PropertyDefaultTypeRequiredDescription
unique_keystringYesColumn or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_columnstring
NoColumn name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columnsAny of:
  string
  array[string]
NoList of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_exclude_columns.
merge_exclude_columnsAny of:
  string
  array[string]
NoList of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_update_columns.
incremental_predicatesAny of:
  string
  array[string]
NoList of conditions to filter incremental data.