Test Python
In this guide, you'll learn how to add data quality and validation tests to your Python Components, create dedicated Python Test Components, and implement specialized tests for custom validation logic.
Prerequisites​
- Ascend Flow
For a comprehensive overview of test types and when to use them, see our Tests concept guide.
Native tests​
Add native tests to Python Components by including them in the @transform
decorator.
from ascend.resources import transform, test
@transform(
# ... other parameters ...
tests=[
test("test_name", column="column_name", parameter="value"),
test("test_name", parameter="value"), # Component-level test
],
)
def my_transform(spark, ctx):
# ... transformation logic ...
Column-level tests​
Validate individual columns by specifying the column
parameter. This example shows how to test that a specific column doesn't contain null values:
import ascend_project_code.transform as T
import ibis
from ascend.application.context import ComponentExecutionContext
from ascend.resources import ref, test, transform
@transform(
inputs=[ref("read_sales_stores", flow="extract-load-databricks")],
materialized="table",
tests=[test("not_null", column="timestamp")],
)
def sales_stores(read_sales_stores: ibis.Table, context: ComponentExecutionContext) -> ibis.Table:
sales_stores = T.clean(read_sales_stores)
return sales_stores
Component-level tests​
Validate the entire dataset by omitting the column
parameter. This example demonstrates testing row counts and partition-level metrics:
from ascend.resources import pyspark, ref, test
from pyspark.sql import DataFrame, SparkSession
@pyspark(
inputs=[ref("green_cab", reshape={"time": {"column": "lpep_pickup_datetime", "granularity": "month"}})],
tests=[
test("not_null", column="VendorID"),
test("count_equal", count=65471),
test("count_distinct_equal", column="_ascend_partition_uuid", count=4),
],
)
def pyspark_smart_reshape_with_partitioned_sql_input(spark: SparkSession, green_cab: DataFrame, context) -> DataFrame:
df = green_cab
return df
Test Components​
Create dedicated Python Test Components for custom validation logic that goes beyond native tests.
Generic test example​
Generic tests are reusable functions that can be applied to any dataset. This example creates a statistical test that validates data skewness within acceptable ranges:
from ascend.resources import TestResult, generic_test
from ibis import ir
@generic_test
def skew_in_range(context, component: ir.Table, column, min, max):
median = context.connection.execute(component[column].quantile(0.5)).item()
mean = context.connection.execute(component[column].mean()).item()
sddev = context.connection.execute(component[column].std()).item()
skew = (3 * (mean - median)) / sddev
if skew > max or skew < min:
return TestResult.empty("skew_in_range", False, f'skew of "{column}" is {skew}, which is outside specified range - expected: {min} <= skew <= {max}')
else:
return TestResult.empty("skew_in_range", True)
Singular test example​
Singular tests validate specific datasets with custom business logic. This example checks that values in a particular column fall within expected bounds:
from ascend.resources import TestResult, ref, singular_test
@singular_test(inputs=[ref("epl_results_by_month", flow="publisher")], severity="error")
def singular_test_python(context, epl_results_by_month):
filtered = epl_results_by_month.filter(epl_results_by_month.AC != "NA").mutate(ac_int=epl_results_by_month.AC.cast("int")).select("ac_int")
failed = filtered.filter((filtered.ac_int < 0) | (filtered.ac_int > 19))
if context.connection.execute(failed.count()) > 0:
return TestResult(
"singular_test_python",
False,
failed.limit(5),
"AC values expected to be between 0 and 19",
)
return TestResult.empty("singular_test_python", True)
Next steps​
- Explore SQL tests
- Learn about YAML tests
- Review the Tests concept guide for comprehensive test strategy guidance