Skip to main content
Version: 3.0.0

Python Transform

Python transforms execute Python code to transform data.

PythonTransform

info

PythonTransform is defined beneath the following ancestor nodes in the YAML structure:

Below are the properties for the PythonTransform. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
inputsarray[InputComponent]
NoList of input names to use in the SQL query.
materializationAny of:
  PartitionMaterialization
  IncrementalMaterialization
  string ("view", "table")
NoResource for data materialization during the transform process.
pythonPythonTransformComponent
NoPython transform function to execute for transforming the data.

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

PropertyDefaultTypeRequiredDescription
componentOne of:
  ReadComponent
  TransformComponent
  TaskComponent
  SingularTestComponent
  CustomPythonReadComponent
  WriteComponent
  CompoundComponent
  AliasedTableComponent
  ExternalTableComponent
YesConfiguration options for the component.

TransformComponent

A component that executes SQL or Python code to transform data.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    SynapseDataPlane
NoData Plane-specific configuration options for a component.
namestring
NoThe name of the model
descriptionstring
NoA brief description of what the model does.
metadataResourceMetadata
NoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
flow_namestring
NoThe name of the flow that the component belongs to.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
data_maintenanceDataMaintenance
NoThe data maintenance configuration options for the component.
testsComponentTestColumn
NoDefines tests to run on the data of this component.
transformOne of:
  SqlTransform
  PythonTransform
  SnowparkTransform
YesThe transform component that executes SQL or Python code to transform data.

PythonTransformComponent

Python transform function to execute for transforming the data.

PropertyDefaultTypeRequiredDescription
entrypointstringYesThe entrypoint for the python transform function.
sourcestringYesThe source file for the python transform function.

IncrementalMaterialization

Container for options for how data is materialized and stored for incremental case.

PropertyDefaultTypeRequiredDescription
incrementalIncrementalOptions
NoResource for partitioning data in storage, improving query performance and data organization.

IncrementalOptions

PropertyDefaultTypeRequiredDescription
unique_keystring
NoColumn or set of columns used as a unique identifier for records
strategystring
NoStrategy to use for incremental data. Could be one of merge, partition or none.
merge_update_columnsAny of:
  string
  array[string]
NoList of columns to include when updating values in merge. Is mutually exclusive with merge_exclude_columns.
merge_exclude_columnsAny of:
  string
  array[string]
NoList of columns to exclude when updating values in merge. Is mutually exclusive with merge_update_columns.
incremental_predicatesAny of:
  string
  array[string]
Nolist of conditions to filter incremental data
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected.

PartitionMaterialization

Container for options for how data is materialized and stored for partitioned case.

PropertyDefaultTypeRequiredDescription
partitionedPartitionedOptions
NoField for options for partitioning data.

PartitionedOptions

Options for partitioning data.

PropertyDefaultTypeRequiredDescription
enable_substitution_by_partition_namebooleanYesEnable substitution by partition name.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected.
output_typetablestring ("table", "view")NoOutput type for partitioned data. Must be either 'table' or 'view'.

InputComponent

Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.

PropertyDefaultTypeRequiredDescription
flowstringYesName of the parent flow that the input component belongs to.
namestringYesThe input component name.
aliasstring
NoThe alias to use for the input component.
partition_specAny of:
  string ("full_reduction", "map")
  RepartitionSpec
NoThe type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed.
wherestring
NoAn optional filter condition to apply to the input component's data.
partition_bindingAny of:
  string
  PartitionBinding
NoAn optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions.

PartitionBinding

PropertyDefaultTypeRequiredDescription
logical_operatorlogical_operatorstring ("AND", "OR")NoThe logical operator to use to combine the partition binding predicates provided
predicatespredicatesarray[string]NoThe list of partition binding predicates to apply to the input component's data

RepartitionSpec

Specification for repartitioning operations on input component's data

PropertyDefaultTypeRequiredDescription
repartitionRepartitionOptions
NoOptions for repartitioning the input component's data.

RepartitionOptions

Options for repartitioning the input component's data.

PropertyDefaultTypeRequiredDescription
partition_bystringYesThe column to partition by.
granularitystringYesThe granularity to use for the partitioning.