Skip to main content
Version: 3.0.0

Add data tests to your Flow Graph

Data quality and validation tests ensure the integrity of your data pipelines by verifying that data meets expected conditions.

Ascend provides testing at multiple levels — from native column tests to Test Components — to ensure that your Flow Graph produces reliable outputs.

Prerequisites​

  • Ascend Flow containing SQL and/or Python Components

What you'll learn​

  • How to implement different types of data tests in your Flow Graph
  • When to use native column-level and Component-level tests
  • How to create custom Test Components for complex validations
  • How to implement Singular tests for specialized test cases

Test types​

Test TypeDescription
Column-level testPre-defined tests that validate individual columns in your data. These tests check properties like non-null values, uniqueness, and value ranges. Add them directly to Component decorators or as Jinja blocks in SQL.
Component-level testPre-defined tests that validate the entire output of a Component rather than specific columns. These tests check table counts, schema validation, and other Component-level properties. Add them directly to Component decorators or as Jinja blocks in SQL.
Test ComponentDedicated Components whose sole purpose is testing. These Components contain custom logic in Python or SQL and perform complex validations that aren't possible with native tests.
Singular testA kind of Test Component decorated with @singular_test that run custom validation logic and return a TestResult object. Use these for specialized test cases that require custom processing.

Native tests​

Native tests are reusable, pre-built validators for your data. Apply them to both Python and SQL Components to verify data quality at two levels:

  • Column level: Validate specific columns (e.g., check for null values or unique entries)
  • Component level: Validate entire datasets (e.g., verify row counts or schema structure)
tip

You'll know it's a Component-level test when no column arguments are passed!

Add native tests to YAML Components by specifying a tests section:

component:
...
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2

Add native tests to Python Component decorators by referencing their names:

@transform(
# ... other parameters ...
tests=[
test("not_null", column="timestamp"),
test("unique", column="store_id"),
],
)

Add native tests to SQL Components by including a with_test Jinja block:

/* SELECT statement */
{{ with_test("not_null", column="id") }}

To explore all available options for native tests, refer to our column-level test guide and our Component-level test guide.

YAML BigQuery Read Component with Component and column tests:​

This YAML Read Component demonstrates both Component-level and column-level tests:

  • Component test: Verifies the total record count equals 8
  • Column tests for _ascend_partition_uuid:
    • Checks for null values
    • Ensures values are not empty strings
    • Confirms exactly 2 distinct partition UUIDs exist
  • Column tests for timestamp:
    • Validates all timestamps are before August 27, 2023
    • Confirms all timestamps are after August 25, 2023
read_bigquery.yaml
component:
read:
connection: bigquery_rc
bigquery:
table:
dataset: my_dataset
name: my_table
partitioning:
time:
include:
- eq: 20230825
- on_or_after: 2023-08-26
before: 2023-08-27
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2
timestamp:
- less_than:
value: "'2023-08-27 00:00:00'"
- greater_than:
value: "'2023-08-25 00:00:00'"

SQL Transform with multiple column tests​

This SQL Transform runs four different tests:

  • Checks that the id column is unique
  • Checks that the feedback_text column is not empty
  • Checks that the timestamp column is not null
  • Checks that the timestamp column dates fall between 2023-01-01 and 2025-12-31
SELECT
*
FROM
{{ ref("my_table", flow="my_flow") }}
ORDER BY
timestamp DESC

{{ with_test("unique", column="id") }}
{{ with_test("not_empty", column="feedback_text") }}
{{ with_test("not_null", column="timestamp") }}
{{ with_test("date_in_range", column="timestamp", min="2023-01-01", max="2025-12-31") }}

SQL Transform with Component test​

This example checks that the record count is 75,471:

component_test.sql
SELECT
*
FROM
{{ ref("my_table", flow="my_flow") }}
ORDER BY
timestamp DESC

{{ with_test("count_equal", count=75471) }}

PySpark Transform with multiple column tests​

This example runs three different tests:

  • Checks for null values in VendorID
  • Verifies the total row count equals 65,471
  • Confirms the number of distinct partition UUIDs equals 4
pyspark_native_column_tests.py
from ascend.resources import pyspark, ref, test
from pyspark.sql import DataFrame, SparkSession


@pyspark(
inputs=[ref("green_cab", reshape={"time": {"column": "lpep_pickup_datetime", "granularity": "month"}})],
tests=[
test("not_null", column="VendorID"),
test("count_equal", count=65471),
test("count_distinct_equal", column="_ascend_partition_uuid", count=4),
],
)
def pyspark_smart_reshape_with_partitioned_sql_input(spark: SparkSession, green_cab: DataFrame, context) -> DataFrame:
df = green_cab
return df

Python Transform with Component test​

This example checks that the record count is four:

native_column_tests.py
import ascend_project_code.transform as T
import ibis
from ascend.application.context import ComponentExecutionContext
from ascend.resources import ref, test, transform


@transform(
inputs=[ref("read_sales_stores", flow="extract-load-databricks")],
materialized="table",
tests=[
test("count_equal", count=4),
],
)
def sales_stores(read_sales_stores: ibis.Table, context: ComponentExecutionContext) -> ibis.Table:
sales_stores = T.clean(read_sales_stores)
return sales_stores

Test Components​

Create dedicated Python and SQL Test Components to perform custom test logic on your data.

To learn more about Test Components, refer to our Test Component guide.

note

Test Components differ from native tests because they use custom logic without additional functionality. Their sole purpose is to test data.

SQL Test Component​

This SQL Component tests for null values in the name column:

test_ascenders.sql
SELECT
*
FROM
{{ ref("ascenders") }}
WHERE
name IS NULL

Python Test Component​

This Python Component verifies that the ascenders table is not empty:

test_ascenders.py
from ascend.resources import TestResult, ref, singular_test


@singular_test(inputs=[ref("ascenders")], severity="error")
def test_ascenders_py(context, ascenders):
if ascenders.count().to_pyarrow().as_py() > 0:
return TestResult.empty("test_ascenders_py", True)
else:
return TestResult(
"test_ascenders_py",
False,
ascenders,
"ascenders must be non-empty, please check the data",
)

Singular tests​

Singular tests are one-off Python assertions that run custom logic against your data.

Create singular tests as Python functions decorated with @singular_test that contain custom validation logic and return a TestResult object.

To explore all available options for singular tests, refer to our singular test guide.

Custom range Singular Test​

This example tests that all values in the AC column, after filtering and conversion to integers, fall between 0 and 19:

singular_test.py
from ascend.resources import TestResult, ref, singular_test


@singular_test(inputs=[ref("epl_results_by_month", flow="publisher")], severity="error")
def singular_test_python(context, epl_results_by_month):
filtered = epl_results_by_month.filter(epl_results_by_month.AC != "NA").mutate(ac_int=epl_results_by_month.AC.cast("int")).select("ac_int")
failed = filtered.filter((filtered.ac_int < 0) | (filtered.ac_int > 19))

if context.connection.execute(failed.count()) > 0:
return TestResult(
"singular_test_python",
False,
failed.limit(5),
"AC values expected to be between 0 and 19",
)

return TestResult.empty("singular_test_python", True)

🎉 Congratulations! You just learned all about data validation and quality testing in Ascend.