Postgres Read Component
Component that reads data from a PostgreSQL table.
Examples
- postgres_read_component_config.yaml
- postgres_merge_materialization.yaml
- postgres_read_incremental.yaml
component:
read:
connection: my-postgres-connection
postgres:
table:
name: my_table
schema: my_schema
component:
read:
postgres:
table:
name: my_table
schema: public
connection: my-postgres-connection
strategy:
incremental:
merge:
unique_key: id # Column or set of columns used as a unique identifier for records.
deletion_column: deleted_at # Column name used for soft-deleting records, if applicable.
on_schema_change: append_new_columns
component:
read:
connection: my-postgres-connection
strategy:
replication:
incremental:
column_name: updated_at # Name of the column to use for tracking incremental updates to the data.
incremental: append # Specifies that new data should be appended incrementally.
postgres:
tables:
- name: table1
schema: public
- name: table2
schema: public
PostgresReadComponent
PostgresReadComponent is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the PostgresReadComponent. Each property links to the specific details section further down in this page.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
| event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
| connection | string | No | Name of the Connection to use for reading data. | |
| columns | array[None] | No | List specifying the columns to read from the source and transformations to make during read. | |
| normalize | boolean | No | Boolean flag indicating whether the output column names should be normalized to a standard naming convention after reading. | |
| preserve_case | boolean | No | Boolean flag indicating whether the case of the column names should be preserved after reading. | |
| uppercase | boolean | No | Boolean flag indicating whether the column names should be transformed to uppercase after reading. | |
| strategy | Any of: full IncrementalReadStrategy PartitionedStrategy | No | Ingest strategy options. | |
| read_options | No | Options for reading from the database or warehouse. | ||
| use_duckdb | boolean | No | Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False | |
| postgres | Postgres | Any of: | No | PostgreSQL read options. |
| arrays_as_json | boolean | No | Ingest PostgreSQL arrays as JSON. |
Property Details
Component
A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent | Yes | Component configuration options. |
ReadComponent
Component that reads data from a system.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| tests | No | Defines tests to run on this Component's data. | ||
| read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | Read component that reads data from a system. |
ComponentColumn
Component column expression definition.
No properties defined.
DatabaseReadOptions
Options for reading from a database or warehouse.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| chunk_size | 100000 | integer | No | Number of rows to read from the table at a time. |
| parallel_read | No | Options for reading from the source in parallel. |
IncrementalReadStrategy
Incremental read strategy for database Read Components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| replication | One of: Any of: cdc Any of: incremental | No | Replication strategy to use for data synchronization. | |
| incremental | Any of: append MergeStrategy | Yes | Incremental processing strategy. | |
| on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns", "smart") | No | Policy to apply when schema changes are detected. Defaults to 'fail' if not provided. |
CdcReplication
Specifies if Change Data Capture (CDC) is the replication strategy.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| cdc | No | Resource for change data capture (CDC), enabling incremental data capture based on changes. |
CdcOptions
No properties defined.
IncrementalReplication
Specifies if incremental data reading is the replication strategy.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| incremental | No | Resource for incremental data reading based on a specific column. |
IncrementalColumn
Specifies the column to be used for incremental reading.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| column_name | string | Yes | Name of the column to use for tracking incremental updates to the data. | |
| start_value | Any of: string integer number string | No | Initial value to start reading data from the specified column. |
MultipleQueries
Options to define one or more arbitrary select statements. The output of the queries will be unioned together, and must return the same database schema.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| queries | array[string] | No | List of SQL queries to execute for reading data. |
MultipleTablesWithSchema
Options for reading from multiple tables in a specific schema.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| tables | array[None] | Yes | List of tables (in specified schemas) to read data from. |