Custom Python Read
A component that reads data using user-defined, custom Python code.
CustomPythonReadComponent
CustomPythonReadComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the CustomPythonReadComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
tests | ComponentTestColumn | No | Defines tests to run on the data of this component. | |
custom_python_read | CustomPythonReadOptions | Yes |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
CustomPythonReadOptions
Configuration options for the custom Python read component.
Property | Default | Type | Required | Description |
---|---|---|---|---|
event_time | string | No | Timestamp column in the component output used to represent event time. | |
entrypoint | string | Yes | The entrypoint for the python transform function. | |
source | string | Yes | The source file for the python transform function. | |
ingest_mode | full | string ("incremental", "full") | No | The ingest mode for the custom Python read connector ('incremental' or 'full'). Defaults to 'full'. |
materialization | PartitionMaterialization | No | Configuration options for how data is materialized after it is read. |
BigQueryDataPlane
Property | Default | Type | Required | Description |
---|---|---|---|---|
bigquery | BigQueryDataPlaneOptions | Yes | BigQuery configuration options. |
BigQueryDataPlaneOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | Any of: BigQueryRangePartitioning BigQueryTimePartitioning | No | Partition By clause for the table. | |
cluster_by | array[string] | No | Clustering keys to be added to the table. |
BigQueryRangePartitioning
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
range | RangeOptions | Yes | Range partitioning options. |
BigQueryTimePartitioning
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
granularity | string ("DAY", "HOUR", "MONTH", "YEAR") | Yes | Granularity of the time partitioning. |
ComponentTestColumn
ColumnTestPython
Test to validate data using a Python function for a single column.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
name | string | Yes | ||
python | ColumnTestPythonOptions | Yes | Configuration options for the Python column test. |
ColumnTestPythonOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
entrypoint | string | Yes | The entrypoint for the python transform function. | |
source | string | Yes | The source file for the python transform function. | |
params | object | No | Parameters for the Python test function. | |
is_asset_test | boolean | No |
ColumnTestSql
Test to validate data using an SQL query for a single column.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
name | string | Yes | ||
sql | string | No | SQL query that tests data for conditions. |
CombinationUniqueTest
Test to check if a value is unique.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
combination_unique | CombinationUniqueTestOptions | Yes | Test to check if a value is unique. |
CombinationUniqueTestOptions
Configuration options for the unique test.
Property | Default | Type | Required | Description |
---|---|---|---|---|
columns | array[string] | Yes | The combination of columns to check for uniqueness. |
ComponentSchemaTest
Test to validate that component columns match expected types.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
match | exact | string ("exact", "ignore_missing") | No | The type of schema matching to perform. 'exact' requires all columns to be present, 'ignore_missing' allows for missing columns. |
columns | object with property values of type string | No | A mapping of column names to their expected types. |
CountDistinctEqualTest
Test to check if the number of distinct values is equal to a certain number.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
count_distinct_equal | CountDistinctEqualTestOptions | Yes | Configuration options for the count_distinct_equal test. |
CountDistinctEqualTestOptions
Configuration options for the count_distinct_equal test.
Property | Default | Type | Required | Description |
---|---|---|---|---|
count | integer | Yes | The number of distinct values to expect. | |
group_by_columns | array[string] | No | The columns to group by. |
CountEqualTest
Test to check if the number of rows is equal to a certain number.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
count_equal | CountEqualTestOptions | Yes | Configuration options for the the count_equal test. |
CountEqualTestOptions
Configuration options for the count_equal test.
Property | Default | Type | Required | Description |
---|---|---|---|---|
count | integer | Yes | The number of rows to expect. |
CountGreaterThanOrEqualTest
Test to check if the number of rows is greater than or equal to a certain number.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
count_greater_than_or_equal | CountGreaterThanOrEqualTestOptions | Yes | Configuration options for the count_greater_than_or_equal test. |
CountGreaterThanOrEqualTestOptions
Configuration options for the count_greater_than_or_equal test.
Property | Default | Type | Required | Description |
---|---|---|---|---|
count | integer | Yes | The value to compare against. | |
group_by_columns | array[string] | No | The columns to group by. |
CountGreaterThanTest
Test to check if the number of rows is greater than a certain number.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
count_greater_than | CountGreaterThanTestOptions | Yes | Configuration options for the count_greater_than test. |
CountGreaterThanTestOptions
Configuration options for the count_greater_than test.
Property | Default | Type | Required | Description |
---|---|---|---|---|
count | integer | Yes | The value to compare against. | |
group_by_columns | array[string] | No | The columns to group by. |
CountLessThanOrEqualTest
Test to check if the number of rows is greater than or equal to a certain number.
Property | Default | Type | Required | Description |
---|---|---|---|---|
severity | error | string ("error", "warn") | No | The severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing. |
count_less_than_or_equal | CountLessThanOrEqualTestOptions | Yes | Configuration options for the count_less_than_or_equal test. |