Add data tests to your Flow Graph
Data quality and validation tests ensure the integrity of your data pipelines by verifying that data meets expected conditions.
Ascend provides testing at multiple levels — from native column tests to Test Components — to ensure that your Flow Graph produces reliable outputs.
Prerequisites​
- Ascend Flow containing SQL and/or Python Components
What you'll learn​
- How to implement different types of data tests in your Flow Graph
- When to use native column-level and Component-level tests
- How to create custom Test Components for complex validations
- How to implement Singular tests for specialized test cases
Test types​
Test Type | Description |
---|---|
Column-level test | Pre-defined tests that validate individual columns in your data. These tests check properties like non-null values, uniqueness, and value ranges. Add them directly to Component decorators or as Jinja blocks in SQL. |
Component-level test | Pre-defined tests that validate the entire output of a Component rather than specific columns. These tests check table counts, schema validation, and other Component-level properties. Add them directly to Component decorators or as Jinja blocks in SQL. |
Test Component | Dedicated Components whose sole purpose is testing. These Components contain custom logic in Python or SQL and perform complex validations that aren't possible with native tests. |
Singular test | A kind of Test Component decorated with @singular_test that run custom validation logic and return a TestResult object. Use these for specialized test cases that require custom processing. |
Native tests​
Native tests are reusable, pre-built validators for your data. Apply them to both Python and SQL Components to verify data quality at two levels:
- Column level: Validate specific columns (e.g., check for null values or unique entries)
- Component level: Validate entire datasets (e.g., verify row counts or schema structure)
You'll know it's a Component-level test when no column arguments are passed!
Add native tests to YAML Components by specifying a tests
section:
component:
...
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2
Add native tests to Python Component decorators by referencing their names:
@transform(
# ... other parameters ...
tests=[
test("not_null", column="timestamp"),
test("unique", column="store_id"),
],
)
Add native tests to SQL Components by including a with_test
Jinja block:
/* SELECT statement */
{{ with_test("not_null", column="id") }}
To explore all available options for native tests, refer to our column-level test guide and our Component-level test guide.
YAML BigQuery Read Component with Component and column tests:​
This YAML Read Component demonstrates both Component-level and column-level tests:
- Component test: Verifies the total record count equals 8
- Column tests for
_ascend_partition_uuid
:- Checks for null values
- Ensures values are not empty strings
- Confirms exactly 2 distinct partition UUIDs exist
- Column tests for
timestamp
:- Validates all timestamps are before August 27, 2023
- Confirms all timestamps are after August 25, 2023
component:
read:
connection: bigquery_rc
bigquery:
table:
dataset: my_dataset
name: my_table
partitioning:
time:
include:
- eq: 20230825
- on_or_after: 2023-08-26
before: 2023-08-27
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2
timestamp:
- less_than:
value: "'2023-08-27 00:00:00'"
- greater_than:
value: "'2023-08-25 00:00:00'"
SQL Transform with multiple column tests​
This SQL Transform runs four different tests:
- Checks that the
id
column is unique - Checks that the
feedback_text
column is not empty - Checks that the timestamp column is not null
- Checks that the timestamp column dates fall between 2023-01-01 and 2025-12-31
SELECT
*
FROM
{{ ref("my_table", flow="my_flow") }}
ORDER BY
timestamp DESC
{{ with_test("unique", column="id") }}
{{ with_test("not_empty", column="feedback_text") }}
{{ with_test("not_null", column="timestamp") }}
{{ with_test("date_in_range", column="timestamp", min="2023-01-01", max="2025-12-31") }}
SQL Transform with Component test​
This example checks that the record count is 75,471:
SELECT
*
FROM
{{ ref("my_table", flow="my_flow") }}
ORDER BY
timestamp DESC
{{ with_test("count_equal", count=75471) }}
PySpark Transform with multiple column tests​
This example runs three different tests:
- Checks for null values in
VendorID
- Verifies the total row count equals 65,471
- Confirms the number of distinct partition UUIDs equals 4
from ascend.resources import pyspark, ref, test
from pyspark.sql import DataFrame, SparkSession
@pyspark(
inputs=[ref("green_cab", reshape={"time": {"column": "lpep_pickup_datetime", "granularity": "month"}})],
tests=[
test("not_null", column="VendorID"),
test("count_equal", count=65471),
test("count_distinct_equal", column="_ascend_partition_uuid", count=4),
],
)
def pyspark_smart_reshape_with_partitioned_sql_input(spark: SparkSession, green_cab: DataFrame, context) -> DataFrame:
df = green_cab
return df
Python Transform with Component test​
This example checks that the record count is four:
import ascend_project_code.transform as T
import ibis
from ascend.application.context import ComponentExecutionContext
from ascend.resources import ref, test, transform
@transform(
inputs=[ref("read_sales_stores", flow="extract-load-databricks")],
materialized="table",
tests=[
test("count_equal", count=4),
],
)
def sales_stores(read_sales_stores: ibis.Table, context: ComponentExecutionContext) -> ibis.Table:
sales_stores = T.clean(read_sales_stores)
return sales_stores
Test Components​
Create dedicated Python and SQL Test Components to perform custom test logic on your data.
To learn more about Test Components, refer to our Test Component guide.
Test Components differ from native tests because they use custom logic without additional functionality. Their sole purpose is to test data.
SQL Test Component​
This SQL Component tests for null values in the name
column:
SELECT
*
FROM
{{ ref("ascenders") }}
WHERE
name IS NULL
Python Test Component​
This Python Component verifies that the ascenders
table is not empty:
from ascend.resources import TestResult, ref, singular_test
@singular_test(inputs=[ref("ascenders")], severity="error")
def test_ascenders_py(context, ascenders):
if ascenders.count().to_pyarrow().as_py() > 0:
return TestResult.empty("test_ascenders_py", True)
else:
return TestResult(
"test_ascenders_py",
False,
ascenders,
"ascenders must be non-empty, please check the data",
)
Singular tests​
Singular tests are one-off Python assertions that run custom logic against your data.
Create singular tests as Python functions decorated with @singular_test
that contain custom validation logic and return a TestResult
object.
To explore all available options for singular tests, refer to our singular test guide.
Custom range Singular Test​
This example tests that all values in the AC
column, after filtering and conversion to integers, fall between 0 and 19:
from ascend.resources import TestResult, ref, singular_test
@singular_test(inputs=[ref("epl_results_by_month", flow="publisher")], severity="error")
def singular_test_python(context, epl_results_by_month):
filtered = epl_results_by_month.filter(epl_results_by_month.AC != "NA").mutate(ac_int=epl_results_by_month.AC.cast("int")).select("ac_int")
failed = filtered.filter((filtered.ac_int < 0) | (filtered.ac_int > 19))
if context.connection.execute(failed.count()) > 0:
return TestResult(
"singular_test_python",
False,
failed.limit(5),
"AC values expected to be between 0 and 19",
)
return TestResult.empty("singular_test_python", True)
🎉 Congratulations! You just learned all about data validation and quality testing in Ascend.