MySQL Read Component
A component that reads data from a MySQL database, options include ingesting a single table / query, or multiple tables / queries.
Examples
- mysql_read_employees.yaml
- mysql_read_component_multiple_tables.yaml
- mysql_read_incremental_materialization_with_comments.yaml
component:
read:
connection: my-mysql-connection
mysql:
table:
name: employees
component:
read:
connection: my-mysql-connection
mysql:
tables:
- name: table1
- name: table2
- name: table3
# MySQLReadComponent configuration with incremental replication and materialization
component:
read:
# Connection to the MySQL database
connection: my-mysql-connection
mysql:
# Table to read data from
table:
name: my_table
replication:
# Incremental replication strategy
incremental:
# Column used for tracking updates
column_name: my_incremental_column
# Initial start value for incremental read
start_value: 0
materialization:
# Merge strategy for materialization
merge:
# Unique key for identifying records
unique_key: my_unique_key
# Mode for merging data
mode: latest
MySQLReadComponent
MySQLReadComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the MySQLReadComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
materialization | MergeMaterialization | No | Strategy to use for data materialization during the read process. | |
replication | One of: Any of: string ("cdc") CdcReplication Any of: string ("incremental") IncrementalReplication | No | Replication strategy to use for data synchronization. | |
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[ComponentColumn] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
chunk_size | integer | No | Size of chunks to read from the table at a time. | |
mysql | Any of: SingleTable MultipleTables SingleQuery MultipleQueries | Yes | MySQL read options. | |
use_checksum | boolean | No | Use table checksum to detect data changes. if false or unset, will do full re-read for every run for full-sync. |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
ReadComponent
A component that reads data from a data system.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
tests | ComponentTestColumn | No | Defines tests to run on the data of this component. | |
read | One of: GenericFileReadComponent LocalFileReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent | Yes | The read component that reads data from a data system. |
MultipleTables
Options from reading from multiple tables. Useful for reading from multiple tables in a single component.
Property | Default | Type | Required | Description |
---|---|---|---|---|
tables | array[TableOptions] | Yes | List of tables to read data from. |
CdcReplication
Specifies if Change Data Capture (CDC) is the replication strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
cdc | CdcOptions | No | Resource for Change Data Capture (CDC), enabling incremental data capture based on changes. |
CdcOptions
No properties defined.
ComponentColumn
Component column expression definition.
No properties defined.
IncrementalReplication
Specifies if incremental data reading is the replication strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
incremental | IncrementalColumn | No | Resource for incremental data reading based on a specific column. |
IncrementalColumn
Specifies the column to be used for incremental reading.
Property | Default | Type | Required | Description |
---|---|---|---|---|
column_name | string | Yes | Name of the column to use for tracking incremental updates to the data. | |
start_value | Any of: string integer number string | No | Initial value to start reading data from the specified column. |
MergeMaterialization
Specifies that the component should merge to materialize the data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
merge | MergeOptions | No | Resource for merging data, including handling deletions and identifying unique records. |
MergeOptions
Resource options for data merging, including mode selection and criteria for detecting deletions and unique records.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | ModeEnum | Yes | Specifies the mode to use for merging data: 'history' to keep all versions, 'latest' to keep only the latest. | |
deletion_col | string | No | Column name to use for identifying deleted records. | |
unique_key | string | Yes | Column or set of columns used as a unique identifier for records, aiding in the merge process. | |
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. |
ModeEnum
No properties defined.
MultipleQueries
Options to define one or more arbitrary select statements. The output of the queries will be unioned together, and must return the same database schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
queries | array[string] | No | List of SQL queries to execute for reading data. |
SingleQuery
Options to define an arbitrary select statement.
Property | Default | Type | Required | Description |
---|---|---|---|---|
query | string | No | SQL query to execute for reading data. |
SingleTable
Options for reading from a single table. The typical way of reading tables from a database.
Property | Default | Type | Required | Description |
---|---|---|---|---|
table | TableOptions | Yes | Table to read data from. |
TableOptions
Options for reading from a specific table.
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | Yes | Name of the table to be read. |