Merge Strategy
A strategy that involves merging new data with existing data by updating existing records that match the unique key.
MergeStrategy
MergeStrategy
is defined beneath the following ancestor nodes in the YAML structure:
- Component
- CustomPythonReadComponent
- CustomPythonReadOptions
- ReadComponent
- BigQueryReadComponent
- DatabricksReadComponent
- MSSQLReadComponent
- MySQLReadComponent
- OracleReadComponent
- PostgresReadComponent
- SnowflakeReadComponent
- IncrementalReadStrategy
- TransformComponent
- PySparkTransform
- PythonTransform
- SnowparkTransform
- SqlTransform
- IncrementalStrategy
- WriteComponent
- BigQueryWriteComponent
- MySQLWriteComponent
- OracleWriteComponent
- SnowflakeWriteComponent
- IncrementalWriteStrategyWithSchemaChange
Below are the properties for the MergeStrategy
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
merge | No | Options for merge strategy. |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
CustomPythonReadComponent
A component that reads data using user-defined custom Python code.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for a component. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
retry_strategy | No | The retry strategy configuration options for the component if any exceptions are encountered. | ||
description | string | No | A brief description of what the model does. | |
metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
name | string | Yes | The name of the model | |
flow_name | string | No | The name of the flow that the component belongs to. | |
data_maintenance | No | The data maintenance configuration options for the component. | ||
tests | No | Defines tests to run on the data of this component. | ||
custom_python_read | Yes |
CustomPythonReadOptions
Configuration options for the Custom Python Read component.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
strategy | full | Any of: full IncrementalStrategy PartitionedStrategy | No | Ingest strategy. |
python | Any of: | Yes | Python code to execute for ingesting data. |
ReadComponent
A component that reads data from a data system.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for a component. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
retry_strategy | No | The retry strategy configuration options for the component if any exceptions are encountered. | ||
description | string | No | A brief description of what the model does. | |
metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
name | string | Yes | The name of the model | |
flow_name | string | No | The name of the flow that the component belongs to. | |
data_maintenance | No | The data maintenance configuration options for the component. | ||
tests | No | Defines tests to run on the data of this component. | ||
read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | The read component that reads data from a data system. |
BigQueryReadComponent
A component that reads data from a BigQuery table.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
bigquery | Any of: | Yes |
DatabricksReadComponent
A component that reads data from a Databricks table.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
databricks | Any of: | Yes |
MSSQLReadComponent
A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
mssql | Any of: | Yes |
MySQLReadComponent
A component that reads data from a MySQL database, options include ingesting a single table / query, or multiple tables / queries.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
use_duckdb | boolean | No | Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False | |
mysql | Any of: | Yes | MySQL read options. | |
use_checksum | boolean | No | Use table checksum to detect data changes. if false or unset, will do full re-read for every run for full-sync. |
OracleReadComponent
A component that reads data from an Oracle table.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
oracle | Oracle | Any of: | No | Oracle read options. |
PostgresReadComponent
A component that reads data from a Postgresql table.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
use_duckdb | boolean | No | Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False | |
postgres | Postgres | Any of: | No | Postgres read options. |
SnowflakeReadComponent
A component that reads data from a Snowflake table.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[None] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
snowflake | Any of: | Yes |
IncrementalReadStrategy
Incremental Read Strategy for database read components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.
Property | Default | Type | Required | Description |
---|---|---|---|---|
replication | One of: Any of: cdc Any of: incremental | No | Replication strategy to use for data synchronization. | |
incremental | Any of: |