Python Transform
Python transforms execute Python code to transform data.
PythonTransform
PythonTransform
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the PythonTransform
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
event_time | string | No | Timestamp column in the component output used to represent event time. | |
microbatch | boolean | No | Whether to process data in microbatches. | |
batch_size | string | No | The size/time granularity of the microbatch to process. | |
lookback | 1 | integer | No | The number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode. |
begin | string | No | The 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run. | |
inputs | array[InputComponent] | No | List of input names to use in the SQL query. | |
materialization | Any of: PartitionMaterialization IncrementalMaterialization string ("view", "table") | No | Resource for data materialization during the transform process. | |
python | PythonTransformComponent | No | Python transform function to execute for transforming the data. |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
TransformComponent
A component that executes SQL or Python code to transform data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
tests | ComponentTestColumn | No | Defines tests to run on the data of this component. | |
transform | One of: SqlTransform PythonTransform SnowparkTransform PySparkTransform | Yes | The transform component that executes SQL or Python code to transform data. |
PythonTransformComponent
Python transform function to execute for transforming the data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
entrypoint | string | Yes | The entrypoint for the python transform function. | |
source | string | Yes | The source file for the python transform function. |
IncrementalMaterialization
Container for options for how data is materialized and stored for incremental case.
Property | Default | Type | Required | Description |
---|---|---|---|---|
incremental | IncrementalOptions | No | Resource for partitioning data in storage, improving query performance and data organization. |
IncrementalOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
unique_key | string | No | Column or set of columns used as a unique identifier for records | |
strategy | string | No | Strategy to use for incremental data. Could be one of merge , partition or none . | |
merge_update_columns | Any of: string array[string] | No | List of columns to include when updating values in merge. Is mutually exclusive with merge_exclude_columns . | |
merge_exclude_columns | Any of: string array[string] | No | List of columns to exclude when updating values in merge. Is mutually exclusive with merge_update_columns . | |
incremental_predicates | Any of: string array[string] | No | list of conditions to filter incremental data | |
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. |
PartitionMaterialization
Container for options for how data is materialized and stored for partitioned case.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | PartitionedOptions | No | Field for options for partitioning data. |
PartitionedOptions
Options for partitioning data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
enable_substitution_by_partition_name | boolean | Yes | Enable substitution by partition name. | |
on_schema_change | string ("ignore", "fail", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. | |
output_type | table | string ("table", "view") | No | Output type for partitioned data. Must be either 'table' or 'view'. |
InputComponent
Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.
Property | Default | Type | Required | Description |
---|---|---|---|---|
flow | string | Yes | Name of the parent flow that the input component belongs to. | |
name | string | Yes | The input component name. | |
alias | string | No | The alias to use for the input component. | |
partition_spec | Any of: string ("full_reduction", "map") RepartitionSpec | No | The type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed. | |
where | string | No | An optional filter condition to apply to the input component's data. | |
partition_binding | Any of: string PartitionBinding | No | An optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions. |
PartitionBinding
Property | Default | Type | Required | Description |
---|---|---|---|---|
logical_operator | logical_operator | string ("AND", "OR") | No | The logical operator to use to combine the partition binding predicates provided |
predicates | predicates | array[string] | No | The list of partition binding predicates to apply to the input component's data |
RepartitionSpec
Specification for repartitioning operations on input component's data
Property | Default | Type | Required | Description |
---|---|---|---|---|
repartition | RepartitionOptions | No | Options for repartitioning the input component's data. |
RepartitionOptions
Options for repartitioning the input component's data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | string | Yes | The column to partition by. | |
granularity | string | Yes | The granularity to use for the partitioning. |