Skip to main content
Version: 3.0.0

Custom Python Read Component

A component that reads data using user-defined custom Python code.

CustomPythonReadComponent

info

CustomPythonReadComponent is defined beneath the following ancestor nodes in the YAML structure:

Below are the properties for the CustomPythonReadComponent. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
retry_strategyNoThe retry strategy configuration options for the component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
data_maintenanceNoThe data maintenance configuration options for the component.
testsNoDefines tests to run on the data of this component.
custom_python_readYes

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

PropertyDefaultTypeRequiredDescription
componentOne of:
  CustomPythonReadComponent
  ApplicationComponent
  AliasedTableComponent
  ExternalTableComponent
YesConfiguration options for the component.

CustomPythonReadOptions

Configuration options for the Custom Python Read component.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
strategyfullAny of:
  string
  IncrementalStrategy
  PartitionedStrategy
NoIngest strategy.
pythonAny of:
  PartitionedListRead
YesPython code to execute for ingesting data.

PartitionedListRead

PropertyDefaultTypeRequiredDescription
listYesPython function that lists partitions in the source.
readYesPython function that reads a partition from the source.

PythonBase

Base class for Python-based components and resources.

PropertyDefaultTypeRequiredDescription
entrypointstringYesThe entrypoint for the python transform function.
sourcestringYesThe source file for the python transform function.

BigQueryDataPlane

PropertyDefaultTypeRequiredDescription
bigqueryYesBigQuery configuration options.

BigQueryDataPlaneOptions

PropertyDefaultTypeRequiredDescription
partition_byAny of:
NoPartition By clause for the table.
cluster_byarray[string]
NoClustering keys to be added to the table.

BigQueryRangePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
rangeYesRange partitioning options.

BigQueryTimePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
granularitystring ("DAY", "HOUR", "MONTH", "YEAR")YesGranularity of the time partitioning.

ComponentTestOptions

Options for component tests, including data quality tests and schema checks.

PropertyDefaultTypeRequiredDescription
columnsobject with property values of type array[One of: (Any of: (string, NotNullTest), Any of: (string, NotEmptyTest), Any of: (string, UniqueTest), CombinationUniqueTest, InRangeTest, DateInRangeTest, InSetTest, SubstringMatchTest, CountDistinctEqualTest, CountGreaterThanOrEqualTest, CountGreaterThanTest, CountLessThanOrEqualTest, CountLessThanTest, CountEqualTest, GreaterThanTest, LessThanTest, GreaterThanOrEqualTest, LessThanOrEqualTest, MeanInRangeTest, StddevInRangeTest, ColumnTestSql, ColumnTestPython)]
NoList of column-level data quality tests for a component.
componentarray[One of: (Any of: (string, NotNullTest), Any of: (string, NotEmptyTest), Any of: (string, UniqueTest), CombinationUniqueTest, InRangeTest, DateInRangeTest, InSetTest, SubstringMatchTest, CountDistinctEqualTest, CountGreaterThanOrEqualTest, CountGreaterThanTest, CountLessThanOrEqualTest, CountLessThanTest, CountEqualTest, GreaterThanTest, LessThanTest, GreaterThanOrEqualTest, LessThanOrEqualTest, MeanInRangeTest, StddevInRangeTest, ColumnTestSql, ColumnTestPython)]
NoList of component-level tests.
schemaNoList of schema checks for a component.

ColumnTestPython

Test to validate data using a Python function for a single column.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
namestringYes
pythonYesConfiguration options for the Python column test.

ColumnTestPythonOptions

PropertyDefaultTypeRequiredDescription
entrypointstringYesThe entrypoint for the python transform function.
sourcestringYesThe source file for the python transform function.
paramsobject with property values of type None
NoParameters for the Python test function.
is_asset_testboolean
No

ColumnTestSql

Test to validate data using an SQL query for a single column.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
namestringYes
sqlstring
NoSQL query that tests data for conditions.

CombinationUniqueTest

Test to check if a value is unique.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
combination_uniqueYesTest to check if a value is unique.

CombinationUniqueTestOptions

Configuration options for the unique test.

PropertyDefaultTypeRequiredDescription
columnsarray[string]YesThe combination of columns to check for uniqueness.

ComponentSchemaTest

Test to validate that component columns match expected types.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
matchexactstring ("exact", "ignore_missing")NoThe type of schema matching to perform. 'exact' requires all columns to be present, 'ignore_missing' allows for missing columns.
columnsobject with property values of type string
NoA mapping of column names to their expected types.

CountDistinctEqualTest

Test to check if the number of distinct values is equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_distinct_equalYes

CountDistinctEqualTestOptions

Configuration options for the count_distinct_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe number of distinct values to expect.
group_by_columnsarray[string]
NoThe columns to group by.

CountEqualTest

Test to check if the number of rows is equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_equalYesConfiguration options for the the count_equal test.

CountEqualTestOptions

Configuration options for the count_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe number of rows to expect.

CountGreaterThanOrEqualTest

Test to check if the number of rows is greater than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_greater_than_or_equalYes

CountGreaterThanOrEqualTestOptions

Configuration options for the count_greater_than_or_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

CountGreaterThanTest

Test to check if the number of rows is greater than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_greater_thanYes

CountGreaterThanTestOptions

Configuration options for the count_greater_than test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

CountLessThanOrEqualTest

Test to check if the number of rows is greater than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_less_than_or_equalYes

CountLessThanOrEqualTestOptions

Configuration options for the count_less_than_or_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

CountLessThanTest

Test to check if the number of rows is less than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_less_thanYes

CountLessThanTestOptions

Configuration options for the count_less_than test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

DataMaintenance

Data maintenance configuration options for the component.

PropertyDefaultTypeRequiredDescription
enabledboolean
NoA boolean flag indicating whether data maintenance is enabled for the component.

DatabricksDataPlane

PropertyDefaultTypeRequiredDescription
databrickscluster_by: null
pyspark_job_cluster_id: null
table_properties: null
NoDatabricks configuration options.

DatabricksDataPlaneOptions

PropertyDefaultTypeRequiredDescription
table_propertiesobject with property values of type string
NoTable properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your data plane.
pyspark_job_cluster_idstring
NoThe ID of the compute cluster to use for PySpark jobs.
cluster_byarray[string]
NoClustering keys to be added to the table.

DateInRangeTest

Test to check if a date is within a certain range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
date_in_rangeYes

DateInRangeTestOptions

Configuration options for the date_in_range test.

PropertyDefaultTypeRequiredDescription
minstringYesThe minimum value to expect.
maxstringYesThe maximum value to expect.

DuckdbDataPlane

PropertyDefaultTypeRequiredDescription
duckdb
NoDuckdb configuration options.

DuckDbDataPlaneOptions

No properties defined.

FabricDataPlane

PropertyDefaultTypeRequiredDescription
fabricspark_session_config: null
NoFabric configuration options.

FabricDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configNoSpark session configuration.

GreaterThanOrEqualTest

Test to check if a value is greater than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
greater_than_or_equalYes

GreaterThanOrEqualTestOptions

Configuration options for the greater_than_or_equal test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

GreaterThanTest

Test to check if a value is greater than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
greater_thanYes

GreaterThanTestOptions

Configuration options for the greater_than test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

InRangeTest

Test to check if a value is within a certain range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
in_rangeYes

InRangeTestOptions

Configuration options for the in_range test.

PropertyDefaultTypeRequiredDescription
minAny of:
  integer
  number
  string
YesThe minimum value to expect.
maxAny of:
  integer
  number
  string
YesThe maximum value to expect.

InSetTest

Test to check if a value is in a set of values.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
in_setYes

InSetTestOptions

Configuration options for the in_set test.

PropertyDefaultTypeRequiredDescription
valuesarray[Any of: (integer, number, string)]YesThe set of values to expect.

LessThanOrEqualTest

Test to check if a value is less than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
less_than_or_equalYes

LessThanOrEqualTestOptions

Configuration options for the less_than_or_equal test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

LessThanTest

Test to check if a value is less than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
less_thanYes

LessThanTestOptions

Configuration options for the less_than test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

MeanInRangeTest

Test to check if a value is within a certain mean range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
mean_in_rangeYes

MeanInRangeTestOptions

Configuration options for the mean_in_range test.

PropertyDefaultTypeRequiredDescription
minAny of:
  integer
  number
  string
YesThe minimum value to expect.
maxAny of:
  integer
  number
  string
YesThe maximum value to expect.

NotEmptyTest

Test to check if a value is not empty.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
not_emptyNoTest to check if a value is not empty.

NotNullTest

Test to check if a value is not null.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
not_nullNoTest to check if a value is not null.

RangeOptions

PropertyDefaultTypeRequiredDescription
startintegerYesStart of the range partitioning.
endintegerYesEnd of the range partitioning.
intervalintegerYesInterval of the range partitioning.

SnowflakeDataPlane

PropertyDefaultTypeRequiredDescription
snowflakeYesSnowflake configuration options.

SnowflakeDataPlaneOptions

PropertyDefaultTypeRequiredDescription
cluster_byarray[string]
NoClustering keys to be added to the table.

IncrementalStrategy

Incremental Processing Strategy.

PropertyDefaultTypeRequiredDescription
incrementalAny of:
  string
  MergeStrategy
  SCDType2Strategy
YesIncremental processing strategy.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedStrategy

Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.

PropertyDefaultTypeRequiredDescription
partitionedNoOptions for partitioning data.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedOptions

Options related to partition optimization - in particular, the policy that determines which partitions to ingest.

PropertyDefaultTypeRequiredDescription
enable_substitution_by_partition_namebooleanYesEnable substitution by partition name.
output_typetablestring ("table", "view")NoOutput type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms.

SCDType2Strategy

The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.

PropertyDefaultTypeRequiredDescription
scd_type_2NoOptions for SCD Type 2 strategy.

StddevInRangeTest

Test to check if a value is within a certain standard deviation range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
stddev_in_rangeYes

StddevInRangeTestOptions

Configuration options for the stddev_in_range test.

PropertyDefaultTypeRequiredDescription
minAny of:
  integer
  number
  string
YesThe minimum value to expect.
maxAny of:
  integer
  number
  string
YesThe maximum value to expect.

SubstringMatchTest

Test to check if a value contains a substring.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
substring_matchYes

SubstringMatchTestOptions

Configuration options for the substring_match test.

PropertyDefaultTypeRequiredDescription
substringstringYesThe substring to search for.

SynapseDataPlane

PropertyDefaultTypeRequiredDescription
synapsespark_session_config: null
NoSynapse configuration options.

SynapseDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configNoSpark session configuration.

LivySparkSessionConfig

PropertyDefaultTypeRequiredDescription
poolstring
NoThe pool to use for the Spark session.
driver_memorystring
NoThe memory to use for the Spark driver.
driver_coresinteger
NoThe number of cores to use for the Spark driver.
executor_memorystring
NoThe memory to use for the Spark executor.
executor_coresinteger
NoThe number of cores to use for each executor.
num_executorsinteger
NoThe number of executors to use for the Spark session.
session_key_overridestring
NoThe key to use for the Spark session.
max_concurrent_sessionsinteger
NoThe maximum number of concurrent sessions of this spec to create.

UniqueTest

Test to check if a value is unique.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
uniqueNoTest to check if a value is unique.

NoTestOptions

Configuration options for tests that have no test body definition (not_null, unique, etc.).

No properties defined.

ResourceMetadata

Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.

PropertyDefaultTypeRequiredDescription
sourceNoThe origin or source information for the resource.
source_event_uuidstring
NoUUID of the event that is associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

PropertyDefaultTypeRequiredDescription
pathstringYesPath within repository files where the resource is defined.
first_line_numberinteger
NoFirst line number within path file where the resource is defined.

RetryStrategy

Retry strategy configuration for component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

PropertyDefaultTypeRequiredDescription
stop_after_attemptinteger
NoThe number of attempts before giving up, if None is set, will not stop after any attempts.
stop_after_delayinteger
NoGive up on retries one attempt before you would exceed the delay, if None is set, will not stop after any attempts.

InputComponent

Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.

PropertyDefaultTypeRequiredDescription
flowstringYesName of the parent flow that the input component belongs to.
namestringYesThe input component name.
aliasstring
NoThe alias to use for the input component.
partition_specAny of:
  string ("full_reduction", "map")
NoThe type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed.
wherestring
NoAn optional filter condition to apply to the input component's data.
partition_bindingAny of:
  string
NoAn optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions.

MergeStrategy

A strategy that involves merging new data with existing data by updating existing records that match the unique key.

PropertyDefaultTypeRequiredDescription
mergeNoOptions for merge strategy.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

PropertyDefaultTypeRequiredDescription
unique_keystringYesColumn or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_columnstring
NoColumn name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columnsAny of:
  string
  array[string]
NoList of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_exclude_columns.
merge_exclude_columnsAny of:
  string
  array[string]
NoList of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_update_columns.
incremental_predicatesAny of:
  string
  array[string]
NoList of conditions to filter incremental data.

PartitionBinding

PropertyDefaultTypeRequiredDescription
logical_operatorlogical_operatorstring ("AND", "OR")NoThe logical operator to use to combine the partition binding predicates provided
predicatespredicatesarray[string]NoThe list of partition binding predicates to apply to the input component's data

RepartitionSpec

Specification for repartitioning operations on input component's data

PropertyDefaultTypeRequiredDescription
repartitionNoOptions for repartitioning the input component's data.

RepartitionOptions

Options for repartitioning the input component's data.

PropertyDefaultTypeRequiredDescription
partition_bystringYesThe column to partition by.
granularitystringYesThe granularity to use for the partitioning.