Merge Strategy
Strategy that involves merging new data with existing data by updating existing records that match the unique key.
MergeStrategy
MergeStrategy is defined beneath the following ancestor nodes in the YAML structure:
- Component
- CustomPythonReadComponent
- CustomPythonReadOptions
- ReadComponent
- BigQueryReadComponent
- DatabricksReadComponent
- MSSQLReadComponent
- MySQLReadComponent
- OracleReadComponent
- PostgresReadComponent
- SnowflakeReadComponent
- IncrementalReadStrategy
- TransformComponent
- PySparkTransform
- PythonTransform
- SnowparkTransform
- SqlTransform
- IncrementalStrategy
- WriteComponent
- BigQueryWriteComponent
- MySQLWriteComponent
- OracleWriteComponent
- PostgresWriteComponent
- SnowflakeWriteComponent
- IncrementalWriteStrategyWithSchemaChange
Below are the properties for the MergeStrategy. Each property links to the specific details section further down in this page.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| merge | No | Options for merge strategy. |
Property Details
Component
A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent DbtNodeComponent | Yes | Component configuration options. |
CustomPythonReadComponent
Component that reads data using user-defined custom Python code.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| tests | No | Defines tests to run on this Component's data. | ||
| custom_python_read | Yes |
CustomPythonReadOptions
Configuration options for the Custom Python Read Component.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
| event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
| strategy | full | Any of: full IncrementalStrategy PartitionedStrategy | No | Ingest strategy. |
| python | Any of: | Yes | Python code to execute for ingesting data. |
ReadComponent
Component that reads data from a system.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| tests | No | Defines tests to run on this Component's data. | ||
| read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | Read component that reads data from a system. |
BigQueryReadComponent
Component that reads data from a BigQuery table.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
| event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
| connection | string | No | Name of the Connection to use for reading data. | |
| columns | array[None] | No | List specifying the columns to read from the source and transformations to make during read. | |
| normalize | boolean | No | Boolean flag indicating whether the output column names should be normalized to a standard naming convention after reading. | |
| preserve_case | boolean | No | Boolean flag indicating whether the case of the column names should be preserved after reading. | |
| uppercase | boolean | No | Boolean flag indicating whether the column names should be transformed to uppercase after reading. | |
| strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
| read_options | No | Options for reading from the database or warehouse. | ||
| bigquery | Any of: | Yes |
DatabricksReadComponent
Component that reads data from a Databricks table.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
| event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
| connection | string | No | Name of the Connection to use for reading data. | |
| columns | array[None] | No | List specifying the columns to read from the source and transformations to make during read. | |
| normalize | boolean | No | Boolean flag indicating whether the output column names should be normalized to a standard naming convention after reading. | |
| preserve_case | boolean | No | Boolean flag indicating whether the case of the column names should be preserved after reading. | |
| uppercase | boolean | No | Boolean flag indicating whether the column names should be transformed to uppercase after reading. | |
| strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
| read_options | No | Options for reading from the database or warehouse. | ||
| databricks | Any of: | Yes |
MSSQLReadComponent
A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
| event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
| connection | string | No | Name of the Connection to use for reading data. | |
| columns | array[None] | No | List specifying the columns to read from the source and transformations to make during read. | |
| normalize | boolean | No | Boolean flag indicating whether the output column names should be normalized to a standard naming convention after reading. | |
| preserve_case | boolean | No | Boolean flag indicating whether the case of the column names should be preserved after reading. | |
| uppercase | boolean | No | Boolean flag indicating whether the column names should be transformed to uppercase after reading. | |
| strategy | Any of: full IncrementalReadStrategy |