Skip to main content

Smart Python Transform

In this guide, we'll build a Smart Python Transform that uses the reshape="map" option to apply custom per-partition transformations for optimized processing.

To learn more about the supported input formats for Python Transforms, check out our concept guide.

Prerequisites

Create a Transform

You can create a Transform in two ways: through the form UI or directly in the Files panel.

  1. Double-click the Flow where you want to add your Transform
  2. Right-click on an existing component (typically a Read component or another Transform) that will provide input data
  3. Select Create DownstreamTransform Creating a Transform from the context menu
  4. Complete the form with these details:
    • Select your Flow
    • Enter a descriptive name for your Transform (e.g., sales_aggregation)
    • Choose the appropriate file type for your Transform logic Transform creation form

Create your Python Transform

Smart Python Transforms allow you to process data partition-by-partition for more efficient data handling. There are two main approaches:

Structure using upstream Partitions

  1. Import necessary packages:

    • Import Ascend resources like transform, ref, and optionally test
    • Import any required data processing libraries (like ibis in this example)
  2. Apply the @transform() decorator with:

    • inputs: Include a primary input with reshape="map" to enable partition-wise processing
    • Configure additional inputs with partition_binding to establish relationships between partitions
  3. Define your transform function:

    • Create a function that processes each partition independently
    • Utilize the context parameter for partition-specific information
  4. Process and return data:

    • Apply your transformation logic to each partition
    • Return the processed data for each partition

Example

smart_transform_upstream.py
"""
Example of a Smart Python Transform Component using upstream partitions.

This file demonstrates how to create a transform that uses the reshape="map" option
with upstream partition binding for optimized per-partition processing.
"""

from ascend.resources import ref, test, transform
from ibis import ir


@transform(
inputs=[ref("primary_data", reshape="map"), ref("join_data", partition_binding="join_key_value = {{ ref('primary_data') }}.join_key")],
tests=[test("count_equal", count=16)], # 4 rows from join_data for each of 4 output partitions
)

🎉 Congratulations! You've successfully created a Smart Python Transform in Ascend.