MSSQL Read Component
A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.
Examples
- mssql_read_multiple_tables_schemas.yaml
- mssql_read_incremental_merge.yaml
- mssql_read_incremental.yaml
component:
read:
connection: my-mssql-connection
mssql:
tables:
- name: table1
schema: schema1
- name: table2
schema: schema2
- name: table3
schema: schema3
component:
read:
connection: my-mssql-connection
mssql:
table:
name: my_table
schema: my_schema
strategy:
replication:
incremental:
column_name: last_modified # Column used for tracking incremental updates to the data.
incremental:
merge:
unique_key: unique_id_column # Unique key for identifying records.
component:
read:
connection: my-mssql-connection
mssql:
query: SELECT * FROM my_table
strategy:
replication:
incremental:
column_name: my_incremental_column # Column used for tracking incremental updates to the data.
start_value: 100 # Initial start value for incremental read.
incremental: append # Specifies that new data should be appended incrementally.
MSSQLReadComponent
MSSQLReadComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the MSSQLReadComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
connection | string | No | Name of the Connection to use for reading data. | |
columns | array[None] | No | List specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | Boolean flag indicating whether the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | Boolean flag indicating whether the case of the column names should be preserved after reading. | |
uppercase | boolean | No | Boolean flag indicating whether the column names should be transformed to uppercase after reading. | |
strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | No | Options for reading from the database or warehouse. | ||
mssql | Any of: | Yes |
Property Details
Component
A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent FivetranComponent | Yes | Component configuration options. |
ReadComponent
Component that reads data from a system.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
description | string | No | Brief description of what the model does. | |
metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
name | string | Yes | The name of the model | |
flow_name | string | No | Name of the Flow that the Component belongs to. | |
data_maintenance | No | The data maintenance configuration options for the Component. | ||
tests | No | Defines tests to run on this Component's data. | ||
read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | Read component that reads data from a system. |
ComponentColumn
Component column expression definition.
No properties defined.
DatabaseReadOptions
Options for reading from a database or warehouse.
Property | Default | Type | Required | Description |
---|---|---|---|---|
chunk_size | 100000 | integer | No | Number of rows to read from the table at a time. |
parallel_read | No | Options for reading from the source in parallel. |
IncrementalReadStrategy
Incremental read strategy for database Read Components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.
Property | Default | Type | Required | Description |
---|---|---|---|---|
replication | One of: Any of: cdc Any of: incremental | No | Replication strategy to use for data synchronization. | |
incremental | Any of: append MergeStrategy | Yes | Incremental processing strategy. | |
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. Defaults to 'fail' if not provided. |
CdcReplication
Specifies if Change Data Capture (CDC) is the replication strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
cdc | No | Resource for change data capture (CDC), enabling incremental data capture based on changes. |
CdcOptions
No properties defined.
IncrementalReplication
Specifies if incremental data reading is the replication strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
incremental | No | Resource for incremental data reading based on a specific column. |
IncrementalColumn
Specifies the column to be used for incremental reading.
Property | Default | Type | Required | Description |
---|---|---|---|---|
column_name | string | Yes | Name of the column to use for tracking incremental updates to the data. | |
start_value | Any of: string integer number string | No | Initial value to start reading data from the specified column. |
MultipleQueries
Options to define one or more arbitrary select statements. The output of the queries will be unioned together, and must return the same database schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
queries | array[string] | No | List of SQL queries to execute for reading data. |
MultipleTablesWithSchema
Options for reading from multiple tables in a specific schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
tables | array[None] | Yes | List of tables (in specified schemas) to read data from. |
ParallelReadOptions
Options for reading from a source in parallel. Ascend will logically partition the source data based on the partition column and max partitions, and then read each partition in parallel. If lower and upper bounds for the partition column are provided, they are used as hints to guide partitioning by dividing the range into max_partitions
roughly equal partitions.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_column | string | Yes | The name of the column to partition the data by. Select a column that could be used to partition the data into smaller chunks and that results in the lowest skew between partitions in terms of record count. This column must either be an integer or timestamp column. | |
max_partitions | -1 | integer | No | Maximum number of partitions to read concurrently from the source. Since this translates directly to the number of concurrent Connections to the source, care should be taken to select a value that does not exceed the source system's Connection or other resource limits. A value of -1 means that the value is chosen automatically based on the source. |
partition_lower_bound | Any of: integer string string | No | Lower bound of the partition column. If not provided, the minimum value of the partition column will be used. | |
partition_upper_bound | Any of: integer string string | No | Upper bound of the partition column. If not provided, the maximum value of the partition column will be used. |
SingleQuery
Options to define an arbitrary select statement.
Property | Default | Type | Required | Description |
---|---|---|---|---|
query | string | No | SQL query to execute for reading data. |
PartitionedStrategy
Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | No | Options for partitioning data. | ||
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. Defaults to 'fail' if not provided. |