Advanced Python patterns
This guide covers advanced Python Transform patterns including different input data formats, the context object, logging utilities, and vault access.
Input data formats
The @transform decorator supports multiple input data formats via the input_data_format parameter. The default is Ibis.
DuckDB PyRelation
Use DuckDB relations for SQL-based transformations with excellent performance:
PyArrow
Use PyArrow for columnar data processing and efficient serialization:
Dictionary format
Use dictionaries for simple column-based operations:
pandas
Use pandas for familiar DataFrame operations:
Context object reference
The context parameter provides access to runtime information, parameters, and utilities.
Runtime information
Parameter access
@transform(inputs=[ref("source")])
def my_transform(data, context):
# Access component parameters
threshold = context.parameters.get("threshold", 100)
config = context.parameters.get("config", {})
# Use in transformation
return data.filter(data["value"] > threshold)
Incremental processing
Partition information
@transform(
inputs=[ref("source", reshape="map")],
)
def partitioned_transform(data, context):
# Get current partition values
partition_id = context.partition_values.get("_ascend_partition_uuid")
date_partition = context.partition_values.get("date")
log(f"Processing partition: {partition_id}, date: {date_partition}")
return data.mutate(partition_processed=partition_id)
Temporary storage
@transform(inputs=[ref("source")])
def transform_with_temp(data, context):
# Access temporary directory for intermediate files
tmp_dir = context.tmp_dir
# Write intermediate data
intermediate_path = f"{tmp_dir}/intermediate.parquet"
data.to_parquet(intermediate_path)
# Process and return
return data
Cache storage
@transform(inputs=[ref("source")])
def transform_with_cache(data, context):
# Store values in cache
context.set_cache("row_count", data.count().execute())
context.set_cache("processed_columns", list(data.columns))
# Later retrieval
cached_count = context.get_cache("row_count")
return data
Vault access
Access secrets securely through the context:
Logging utilities
Ascend provides logging utilities through ascend.common.events.
Basic logging
from ascend.common.events import log
@transform(inputs=[ref("source")])
def my_transform(data, context):
log("Starting transformation")
# Process data
result = data.mutate(value=data.value * 2)
log(f"Processed {result.count().execute()} rows")
return result
Debug logging with stacktrace
from ascend.common.events import log
@transform(inputs=[ref("source")])
def debug_transform(data, context):
# Include stacktrace for debugging
log("Debug checkpoint reached", stacktrace=True)
return data
Performance timing
Exception logging
from ascend.common.events import log_on_exception
@transform(inputs=[ref("source")])
def safe_transform(data, context):
with log_on_exception("Error during data processing"):
# Code that might raise an exception
result = risky_operation(data)
return result
Conditional exception handling
import contextlib
from ascend.common.events import log_on_exception
@transform(inputs=[ref("source")])
def selective_transform(data, context):
# Log all exceptions except ValueError
with contextlib.suppress(ValueError):
with log_on_exception(
"Processing error",
where=lambda e: not isinstance(e, ValueError)
):
result = validate_and_process(data)
return data
Smart partitioned readers
Create custom partitioned readers for efficient data ingestion:
Schema evolution
Control how schema changes are handled:
@transform(
inputs=[ref("source")],
on_schema_change="sync_all_columns", # Options: "sync_all_columns", "fail", "ignore"
)
def evolving_schema_transform(data, context):
"""Handle schema changes gracefully."""
return data
Options:
"sync_all_columns": Add new columns, update existing (recommended)"fail": Fail on any schema change"ignore": Ignore schema changes
Best practices
- Choose the right input format: Use Ibis (default) for most cases; pandas for complex transformations; DuckDB for SQL-heavy operations
- Use type hints: Always include type hints for better documentation and IDE support
- Log meaningful information: Use structured logging for debugging and monitoring
- Handle errors gracefully: Use
log_on_exceptionfor better error visibility - Access secrets via vaults: Never hardcode credentials; always use vault references
- Leverage caching: Use
context.set_cache()for expensive computations that may be reused
Next steps
- Learn about Simple Python Transforms
- Explore Incremental Python Transforms
- See Smart Python Transforms