Skip to main content
Version: 3.0.0

Custom Python Read Component

A component that reads data using user-defined custom Python code.

CustomPythonReadComponent

info

CustomPythonReadComponent is defined beneath the following ancestor nodes in the YAML structure:

Below are the properties for the CustomPythonReadComponent. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    SynapseDataPlane
    FabricDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
descriptionstring
NoA brief description of what the model does.
metadataResourceMetadata
NoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
data_maintenanceDataMaintenance
NoThe data maintenance configuration options for the component.
testsComponentTestOptions
NoDefines tests to run on the data of this component.
custom_python_readCustomPythonReadOptionsYes

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

PropertyDefaultTypeRequiredDescription
componentOne of:
  ReadComponent
  TransformComponent
  TaskComponent
  SingularTestComponent
  CustomPythonReadComponent
  WriteComponent
  CompoundComponent
  AliasedTableComponent
  ExternalTableComponent
YesConfiguration options for the component.

CustomPythonReadOptions

Configuration options for the Custom Python Read component.

PropertyDefaultTypeRequiredDescription
event_timestring
NoTimestamp column in the component output used to represent event time.
strategyfullAny of:
  string
  IncrementalStrategy
  PartitionedStrategy
NoIngest strategy.
pythonAny of:
  PythonBase
  PartitionedListRead
YesPython code to execute for ingesting data.

PartitionedListRead

PropertyDefaultTypeRequiredDescription
listPythonBaseYesPython function that lists partitions in the source.
readPythonBaseYesPython function that reads a partition from the source.

PythonBase

Base class for Python-based components and resources.

PropertyDefaultTypeRequiredDescription
entrypointstringYesThe entrypoint for the python transform function.
sourcestringYesThe source file for the python transform function.

BigQueryDataPlane

PropertyDefaultTypeRequiredDescription
bigqueryBigQueryDataPlaneOptionsYesBigQuery configuration options.

BigQueryDataPlaneOptions

PropertyDefaultTypeRequiredDescription
partition_byAny of:
  BigQueryRangePartitioning
  BigQueryTimePartitioning
NoPartition By clause for the table.
cluster_byarray[string]
NoClustering keys to be added to the table.

BigQueryRangePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
rangeRangeOptionsYesRange partitioning options.

BigQueryTimePartitioning

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
granularitystring ("DAY", "HOUR", "MONTH", "YEAR")YesGranularity of the time partitioning.

ComponentTestOptions

Options for component tests, including data quality tests and schema checks.

PropertyDefaultTypeRequiredDescription
columnsobject with property values of type array[One of: (Any of: (string, NotNullTest), Any of: (string, NotEmptyTest), Any of: (string, UniqueTest), CombinationUniqueTest, InRangeTest, DateInRangeTest, InSetTest, SubstringMatchTest, CountDistinctEqualTest, CountGreaterThanOrEqualTest, CountGreaterThanTest, CountLessThanOrEqualTest, CountLessThanTest, CountEqualTest, GreaterThanTest, LessThanTest, GreaterThanOrEqualTest, LessThanOrEqualTest, MeanInRangeTest, StddevInRangeTest, ColumnTestSql, ColumnTestPython)]
NoList of column-level data quality tests for a component.
componentarray[One of: (Any of: (string, NotNullTest), Any of: (string, NotEmptyTest), Any of: (string, UniqueTest), CombinationUniqueTest, InRangeTest, DateInRangeTest, InSetTest, SubstringMatchTest, CountDistinctEqualTest, CountGreaterThanOrEqualTest, CountGreaterThanTest, CountLessThanOrEqualTest, CountLessThanTest, CountEqualTest, GreaterThanTest, LessThanTest, GreaterThanOrEqualTest, LessThanOrEqualTest, MeanInRangeTest, StddevInRangeTest, ColumnTestSql, ColumnTestPython)]
NoList of component-level tests.
schemaComponentSchemaTest
NoList of schema checks for a component.

ColumnTestPython

Test to validate data using a Python function for a single column.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
namestringYes
pythonColumnTestPythonOptionsYesConfiguration options for the Python column test.

ColumnTestPythonOptions

PropertyDefaultTypeRequiredDescription
entrypointstringYesThe entrypoint for the python transform function.
sourcestringYesThe source file for the python transform function.
paramsobject with property values of type None
NoParameters for the Python test function.
is_asset_testboolean
No

ColumnTestSql

Test to validate data using an SQL query for a single column.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
namestringYes
sqlstring
NoSQL query that tests data for conditions.

CombinationUniqueTest

Test to check if a value is unique.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
combination_uniqueCombinationUniqueTestOptionsYesTest to check if a value is unique.

CombinationUniqueTestOptions

Configuration options for the unique test.

PropertyDefaultTypeRequiredDescription
columnsarray[string]YesThe combination of columns to check for uniqueness.

ComponentSchemaTest

Test to validate that component columns match expected types.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
matchexactstring ("exact", "ignore_missing")NoThe type of schema matching to perform. 'exact' requires all columns to be present, 'ignore_missing' allows for missing columns.
columnsobject with property values of type string
NoA mapping of column names to their expected types.

CountDistinctEqualTest

Test to check if the number of distinct values is equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_distinct_equalCountDistinctEqualTestOptionsYes

CountDistinctEqualTestOptions

Configuration options for the count_distinct_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe number of distinct values to expect.
group_by_columnsarray[string]
NoThe columns to group by.

CountEqualTest

Test to check if the number of rows is equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_equalCountEqualTestOptionsYesConfiguration options for the the count_equal test.

CountEqualTestOptions

Configuration options for the count_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe number of rows to expect.

CountGreaterThanOrEqualTest

Test to check if the number of rows is greater than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_greater_than_or_equalCountGreaterThanOrEqualTestOptionsYes

CountGreaterThanOrEqualTestOptions

Configuration options for the count_greater_than_or_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

CountGreaterThanTest

Test to check if the number of rows is greater than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_greater_thanCountGreaterThanTestOptionsYes

CountGreaterThanTestOptions

Configuration options for the count_greater_than test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

CountLessThanOrEqualTest

Test to check if the number of rows is greater than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_less_than_or_equalCountLessThanOrEqualTestOptionsYes

CountLessThanOrEqualTestOptions

Configuration options for the count_less_than_or_equal test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

CountLessThanTest

Test to check if the number of rows is less than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
count_less_thanCountLessThanTestOptionsYes

CountLessThanTestOptions

Configuration options for the count_less_than test.

PropertyDefaultTypeRequiredDescription
countintegerYesThe value to compare against.
group_by_columnsarray[string]
NoThe columns to group by.

DataMaintenance

Data maintenance configuration options for the component.

PropertyDefaultTypeRequiredDescription
enabledboolean
NoA boolean flag indicating whether data maintenance is enabled for the component.

DatabricksDataPlane

PropertyDefaultTypeRequiredDescription
databrickscluster_by: null
pyspark_job_cluster_id: null
table_properties: null
DatabricksDataPlaneOptionsNoDatabricks configuration options.

DatabricksDataPlaneOptions

PropertyDefaultTypeRequiredDescription
table_propertiesobject with property values of type string
NoTable properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your data plane.
pyspark_job_cluster_idstring
NoThe ID of the compute cluster to use for PySpark jobs.
cluster_byarray[string]
NoClustering keys to be added to the table.

DateInRangeTest

Test to check if a date is within a certain range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
date_in_rangeDateInRangeTestOptionsYes

DateInRangeTestOptions

Configuration options for the date_in_range test.

PropertyDefaultTypeRequiredDescription
minstringYesThe minimum value to expect.
maxstringYesThe maximum value to expect.

DuckdbDataPlane

PropertyDefaultTypeRequiredDescription
duckdb
DuckDbDataPlaneOptionsNoDuckdb configuration options.

DuckDbDataPlaneOptions

No properties defined.

FabricDataPlane

PropertyDefaultTypeRequiredDescription
fabricspark_session_config: null
FabricDataPlaneOptionsNoFabric configuration options.

FabricDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configLivySparkSessionConfig
NoSpark session configuration.

GreaterThanOrEqualTest

Test to check if a value is greater than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
greater_than_or_equalGreaterThanOrEqualTestOptionsYes

GreaterThanOrEqualTestOptions

Configuration options for the greater_than_or_equal test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

GreaterThanTest

Test to check if a value is greater than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
greater_thanGreaterThanTestOptionsYes

GreaterThanTestOptions

Configuration options for the greater_than test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

InRangeTest

Test to check if a value is within a certain range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
in_rangeInRangeTestOptionsYes

InRangeTestOptions

Configuration options for the in_range test.

PropertyDefaultTypeRequiredDescription
minAny of:
  integer
  number
  string
YesThe minimum value to expect.
maxAny of:
  integer
  number
  string
YesThe maximum value to expect.

InSetTest

Test to check if a value is in a set of values.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
in_setInSetTestOptionsYes

InSetTestOptions

Configuration options for the in_set test.

PropertyDefaultTypeRequiredDescription
valuesarray[Any of: (integer, number, string)]YesThe set of values to expect.

LessThanOrEqualTest

Test to check if a value is less than or equal to a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
less_than_or_equalLessThanOrEqualTestOptionsYes

LessThanOrEqualTestOptions

Configuration options for the less_than_or_equal test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

LessThanTest

Test to check if a value is less than a certain number.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
less_thanLessThanTestOptionsYes

LessThanTestOptions

Configuration options for the less_than test.

PropertyDefaultTypeRequiredDescription
valueAny of:
  integer
  number
  string
YesThe value to compare against.

MeanInRangeTest

Test to check if a value is within a certain mean range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
mean_in_rangeMeanInRangeTestOptionsYes

MeanInRangeTestOptions

Configuration options for the mean_in_range test.

PropertyDefaultTypeRequiredDescription
minAny of:
  integer
  number
  string
YesThe minimum value to expect.
maxAny of:
  integer
  number
  string
YesThe maximum value to expect.

NotEmptyTest

Test to check if a value is not empty.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
not_emptyNoTestOptions
NoTest to check if a value is not empty.

NotNullTest

Test to check if a value is not null.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
not_nullNoTestOptions
NoTest to check if a value is not null.

RangeOptions

PropertyDefaultTypeRequiredDescription
startintegerYesStart of the range partitioning.
endintegerYesEnd of the range partitioning.
intervalintegerYesInterval of the range partitioning.

SnowflakeDataPlane

PropertyDefaultTypeRequiredDescription
snowflakeSnowflakeDataPlaneOptionsYesSnowflake configuration options.

SnowflakeDataPlaneOptions

PropertyDefaultTypeRequiredDescription
cluster_byarray[string]
NoClustering keys to be added to the table.

IncrementalStrategy

Incremental Processing Strategy.

PropertyDefaultTypeRequiredDescription
incrementalAny of:
  string
  MergeStrategy
  SCDType2Strategy
YesIncremental processing strategy.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedStrategy

Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.

PropertyDefaultTypeRequiredDescription
partitionedPartitionedOptions
NoOptions for partitioning data.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedOptions

Options related to partition optimization - in particular, the policy that determines which partitions to ingest.

PropertyDefaultTypeRequiredDescription
enable_substitution_by_partition_namebooleanYesEnable substitution by partition name.
output_typetablestring ("table", "view")NoOutput type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms.

SCDType2Strategy

The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.

PropertyDefaultTypeRequiredDescription
scd_type_2KeyOptions
NoOptions for SCD Type 2 strategy.

StddevInRangeTest

Test to check if a value is within a certain standard deviation range.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
stddev_in_rangeStddevInRangeTestOptionsYes

StddevInRangeTestOptions

Configuration options for the stddev_in_range test.

PropertyDefaultTypeRequiredDescription
minAny of:
  integer
  number
  string
YesThe minimum value to expect.
maxAny of:
  integer
  number
  string
YesThe maximum value to expect.

SubstringMatchTest

Test to check if a value contains a substring.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
substring_matchSubstringMatchTestOptionsYes

SubstringMatchTestOptions

Configuration options for the substring_match test.

PropertyDefaultTypeRequiredDescription
substringstringYesThe substring to search for.

SynapseDataPlane

PropertyDefaultTypeRequiredDescription
synapsespark_session_config: null
SynapseDataPlaneOptionsNoSynapse configuration options.

SynapseDataPlaneOptions

PropertyDefaultTypeRequiredDescription
spark_session_configLivySparkSessionConfig
NoSpark session configuration.

LivySparkSessionConfig

PropertyDefaultTypeRequiredDescription
poolstring
NoThe pool to use for the Spark session.
driver_memorystring
NoThe memory to use for the Spark driver.
driver_coresinteger
NoThe number of cores to use for the Spark driver.
executor_memorystring
NoThe memory to use for the Spark executor.
executor_coresinteger
NoThe number of cores to use for each executor.
num_executorsinteger
NoThe number of executors to use for the Spark session.
session_key_overridestring
NoThe key to use for the Spark session.
max_concurrent_sessionsinteger
NoThe maximum number of concurrent sessions of this spec to create.

UniqueTest

Test to check if a value is unique.

PropertyDefaultTypeRequiredDescription
severityerrorstring ("error", "warn")NoThe severity level for issues raised by the test. Default is 'error'. Use 'error' for critical issues that should interrupt flow processing. Use 'warn' for warnings/minor issues that should not interrupt flow processing.
uniqueNoTestOptions
NoTest to check if a value is unique.

NoTestOptions

Configuration options for tests that have no test body definition (not_null, unique, etc.).

No properties defined.

ResourceMetadata

Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.

PropertyDefaultTypeRequiredDescription
sourceResourceLocation
NoThe origin or source information for the resource.
source_event_uuidstring
NoUUID of the event that is associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

PropertyDefaultTypeRequiredDescription
pathstringYesPath within repository files where the resource is defined.
first_line_numberinteger
NoFirst line number within path file where the resource is defined.

MergeStrategy

A strategy that involves merging new data with existing data by updating existing records that match the unique key.

PropertyDefaultTypeRequiredDescription
mergeKeyOptions
NoOptions for merge strategy.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

PropertyDefaultTypeRequiredDescription
unique_keystringYesColumn or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_columnstring
NoColumn name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columnsAny of:
  string
  array[string]
NoList of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_exclude_columns.
merge_exclude_columnsAny of:
  string
  array[string]
NoList of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_update_columns.
incremental_predicatesAny of:
  string
  array[string]
NoList of conditions to filter incremental data.