Skip to main content
Version: 3.0.0

Create a Smart Python Transform

In this guide, we'll build a Smart Python Transform that uses the reshape="map" option to apply custom per-partition transformations for optimized processing.

Prerequisites​

Create a Transform​

You can create a Transform in two ways: through the form UI or directly in the Files panel.

  1. Double-click the Flow where you want to add your Transform
  2. Right-click on an existing component (typically a Read component or another Transform) that will provide input data
  3. Select Create Downstream → Transform Creating a Transform from the context menu
  4. Complete the form with these details:
    • Select your Flow
    • Enter a descriptive name for your Transform (e.g., sales_aggregation)
    • Choose the appropriate file type for your Transform logic Transform creation form

Create your Python Transform​

Smart Python Transforms allow you to process data partition-by-partition for more efficient data handling. There are two main approaches:

Structure using upstream Partitions​

  1. Import necessary packages:

    • Import Ascend resources like transform, ref, and optionally test
    • Import any required data processing libraries (like ibis in this example)
  2. Apply the @transform() decorator with:

    • inputs: Include a primary input with reshape="map" to enable partition-wise processing
    • Configure additional inputs with partition_binding to establish relationships between partitions
  3. Define your transform function:

    • Create a function that processes each partition independently
    • Utilize the context parameter for partition-specific information
  4. Process and return data:

    • Apply your transformation logic to each partition
    • Return the processed data for each partition

Example​

smart_transform_upstream.py
"""
Example of a Smart Python Transform Component using upstream partitions.

This file demonstrates how to create a transform that uses the reshape="map" option
with upstream partition binding for optimized per-partition processing.
"""

from ibis import ir

from ascend.resources import ref, test, transform


@transform(
inputs=[ref("primary_data", reshape="map"), ref("join_data", partition_binding="join_key_value = {{ ref('primary_data') }}.join_key")],
tests=[test("count_equal", count=16)], # 4 rows from join_data for each of 4 output partitions
)
def partition_binding(primary_data: ir.Table, join_data: ir.Table, context) -> ir.Table:
"""
Process data using partition binding to efficiently join related partitions.

This function demonstrates how to use partition binding to join data across partitions.
Since we are binding join_data to primary_data on the join_key_value, we should get
multiple partitions from join_data for each output partition processed.

Args:
primary_data: The main partitioned dataset
join_data: Additional data bound to primary_data partitions
context: Component execution context

Returns:
Processed join_data for the current partition
"""
# Assert that we received the expected amount of data for this partition
# This is to verify that partition binding is working as expected
assert join_data.count().execute() == 4

# In a real implementation, you would perform transformation logic here
# For this example, we're simply returning the bound data
return join_data

🎉 Congratulations! You've successfully created a Smart Python Transform in Ascend.