Oracle Read Component
A component that reads data from an Oracle table.
Examples
- oracle_read_component_config.yaml
- oracle_read_component_config_no_normalize.yaml
- oracle_read_component.yaml
- oracle_read_cdc_direct.yaml
component:
read:
connection: my-oracle-connection
oracle:
table:
name: my_table
schema: my_schema
component:
read:
connection: my-oracle-connection
oracle:
queries:
- SELECT column1, column2 FROM table1
- SELECT column1, column2 FROM table2
component:
read:
connection: my-oracle-connection
oracle:
tables:
- name: table1
schema: schema1
- name: table2
schema: schema1
component:
read:
connection: my-oracle-connection
oracle:
table:
name: my_table
schema: my_schema
strategy:
replication: cdc # Specifies the Change Data Capture (CDC) replication strategy.
incremental: append # Specifies that changes from CDC should be appended incrementally.
OracleReadComponent
OracleReadComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the OracleReadComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
event_time | string | No | Timestamp column in the component output used to represent event time. | |
connection | string | No | The name of the connection to use for reading data. | |
columns | array[ComponentColumn] | No | A list specifying the columns to read from the source and transformations to make during read. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved after reading. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase after reading. | |
strategy | Any of: string IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
read_options | DatabaseReadOptions | No | Options for reading from the database or warehouse. | |
oracle | Oracle | Any of: SingleTableWithSchema MultipleTablesWithSchema SingleQuery MultipleQueries | No | Oracle read options. |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
ReadComponent
A component that reads data from a data system.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane FabricDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for a component. | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
name | string | Yes | The name of the model | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
tests | ComponentTestOptions | No | Defines tests to run on the data of this component. | |
read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | The read component that reads data from a data system. |
ComponentColumn
Component column expression definition.
No properties defined.
DatabaseReadOptions
Options for reading from a database or warehouse.
Property | Default | Type | Required | Description |
---|---|---|---|---|
chunk_size | 100000 | integer | No | Number of rows to read from the table at a time. |
parallel_read | ParallelReadOptions | No | Options for reading from the source in parallel. |
IncrementalReadStrategy
Incremental Read Strategy for database read components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.
Property | Default | Type | Required | Description |
---|---|---|---|---|
replication | One of: Any of: string CdcReplication Any of: string IncrementalReplication | No | Replication strategy to use for data synchronization. | |
incremental | Any of: string MergeStrategy SCDType2Strategy | Yes | Incremental processing strategy. | |
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. Defaults to 'fail' if not provided. |
CdcReplication
Specifies if Change Data Capture (CDC) is the replication strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
cdc | CdcOptions | No | Resource for Change Data Capture (CDC), enabling incremental data capture based on changes. |
CdcOptions
No properties defined.
IncrementalReplication
Specifies if incremental data reading is the replication strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
incremental | IncrementalColumn | No | Resource for incremental data reading based on a specific column. |
IncrementalColumn
Specifies the column to be used for incremental reading.
Property | Default | Type | Required | Description |
---|---|---|---|---|
column_name | string | Yes | Name of the column to use for tracking incremental updates to the data. | |
start_value | Any of: string integer number string | No | Initial value to start reading data from the specified column. |
MultipleQueries
Options to define one or more arbitrary select statements. The output of the queries will be unioned together, and must return the same database schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
queries | array[string] | No | List of SQL queries to execute for reading data. |
MultipleTablesWithSchema
Options for reading from multiple tables in a specific schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
tables | array[TableWithSchemaOptions] | Yes | List of tables (in specified schemas) to read data from. |
ParallelReadOptions
Options for reading from a source in parallel. Ascend will logically partition the source data based on the partition column and max partitions, and then read each partition in parallel. If lower and upper bounds for the partition column are provided, they are used as hints to guide partitioning by dividing the range into max_partitions
roughly equal partitions.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_column | string | Yes | The name of the column to partition the data by. Select a column that could be used to partition the data into smaller chunks and that results in the lowest skew between partitions in terms of record count. This column must either be an integer or timestamp column. | |
max_partitions | -1 | integer | No | The maximum number of partitions to read concurrently from the source. Since this translates directly to the number of concurrent connections to the source, care should be taken to select a value that does not exceed the source system's connection or other resource limits. A value of -1 means that the value is chosen automatically based on the source. |
partition_lower_bound | Any of: integer string string | No | The lower bound of the partition column. If not provided, the minimum value of the partition column will be used. | |
partition_upper_bound | Any of: integer string string | No | The upper bound of the partition column. If not provided, the maximum value of the partition column will be used. |
SingleQuery
Options to define an arbitrary select statement.
Property | Default | Type | Required | Description |
---|---|---|---|---|
query | string | No | SQL query to execute for reading data. |
PartitionedStrategy
Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | PartitionedOptions | No | Options for partitioning data. | |
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. Defaults to 'fail' if not provided. |
PartitionedOptions
Options related to partition optimization - in particular, the policy that determines which partitions to ingest.
Property | Default | Type | Required | Description |
---|---|---|---|---|
enable_substitution_by_partition_name | boolean | Yes | Enable substitution by partition name. | |
output_type | table | string ("table", "view") | No | Output type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms. |
SCDType2Strategy
The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.
Property | Default | Type | Required | Description |
---|---|---|---|---|
scd_type_2 | KeyOptions | No | Options for SCD Type 2 strategy. |
MergeStrategy
A strategy that involves merging new data with existing data by updating existing records that match the unique key.
Property | Default | Type | Required | Description |
---|---|---|---|---|
merge | KeyOptions | No | Options for merge strategy. |
KeyOptions
Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.
Property | Default | Type | Required | Description |
---|---|---|---|---|
unique_key | string | Yes | Column or comma-separated set of columns used as a unique identifier for records, aiding in the merge process. | |
deletion_column | string | No | Column name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions. | |
merge_update_columns | Any of: string array[string] | No | List of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_exclude_columns . | |
merge_exclude_columns | Any of: string array[string] | No | List of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_update_columns . | |
incremental_predicates | Any of: string array[string] | No | List of conditions to filter incremental data. |
SingleTableWithSchema
Options for reading from a single table in a specific schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
table | TableWithSchemaOptions | Yes | Table (in specified schema) to read data from. |
TableWithSchemaOptions
Options for reading from a specific table in a schema.
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | Yes | Name of the table to be read. | |
schema | string | No | Schema of the table, if applicable. |