Version: 3.0.0

Add data tests to your Flow Graph

Data quality and validation tests ensure the integrity of your data pipelines by verifying that data meets expected conditions.

Ascend provides testing at multiple levels — from native column tests to Test Components — to ensure that your Flow Graph produces reliable outputs.

Prerequisites

Ascend Flow containing SQL and/or Python Components

What you'll learn

How to implement different types of data tests in your Flow Graph
When to use native column-level and Component-level tests
How to create custom Test Components for complex validations
How to implement Singular tests for specialized test cases

Test types

Test Type	Description
Column-level test	Pre-defined tests that validate individual columns in your data. These tests check properties like non-null values, uniqueness, and value ranges. Add them directly to Component decorators or as Jinja blocks in SQL.
Component-level test	Pre-defined tests that validate the entire output of a Component rather than specific columns. These tests check table counts, schema validation, and other Component-level properties. Add them directly to Component decorators or as Jinja blocks in SQL.
Test Component	Dedicated Components whose sole purpose is testing. These Components contain custom logic in Python or SQL and perform complex validations that aren't possible with native tests.
Singular test	A kind of Test Component decorated with `@singular_test` that run custom validation logic and return a `TestResult` object. Use these for specialized test cases that require custom processing.

Native tests

Native tests are reusable, pre-built validators for your data. Apply them to both Python and SQL Components to verify data quality at two levels:

Column level: Validate specific columns (e.g., check for null values or unique entries)
Component level: Validate entire datasets (e.g., verify row counts or schema structure)

tip

You'll know it's a Component-level test when no column arguments are passed!

Add native tests to YAML Components by specifying a tests section:

component:
...
  tests:
    component:
      - count_equal:
          count: 8
    columns:
      _ascend_partition_uuid:
        - not_null
        - not_empty
        - count_distinct_equal:
            count: 2

Add native tests to Python Component decorators by referencing their names:

@transform(
    # ... other parameters ...
    tests=[
        test("not_null", column="timestamp"),
        test("unique", column="store_id"),
    ],
)

Add native tests to SQL Components by including a with_test Jinja block:

/* SELECT statement */
{{ with_test("not_null", column="id") }}

To explore all available options for native tests, refer to our column-level test guide and our Component-level test guide.

YAML BigQuery Read Component with Component and column tests:

This YAML Read Component demonstrates both Component-level and column-level tests:

Component test: Verifies the total record count equals 8
Column tests for _ascend_partition_uuid:
- Checks for null values
- Ensures values are not empty strings
- Confirms exactly 2 distinct partition UUIDs exist
Column tests for timestamp:
- Validates all timestamps are before August 27, 2023
- Confirms all timestamps are after August 25, 2023

read_bigquery.yaml
component:
  read:
    connection: bigquery_rc
    bigquery:
      table:
        dataset: my_dataset
        name: my_table
        partitioning:
          time:
            include:
            - eq: 20230825
            - on_or_after: 2023-08-26
              before: 2023-08-27
  tests:
    component:
      - count_equal:
          count: 8
    columns:
      _ascend_partition_uuid:
        - not_null
        - not_empty
        - count_distinct_equal:
            count: 2
      timestamp:
        - less_than:
            value: "'2023-08-27 00:00:00'"
        - greater_than:
            value: "'2023-08-25 00:00:00'"

SQL Transform with multiple column tests

This SQL Transform runs four different tests:

Checks that the id column is unique
Checks that the feedback_text column is not empty
Checks that the timestamp column is not null
Checks that the timestamp column dates fall between 2023-01-01 and 2025-12-31

SELECT
    *
FROM
    {{ ref("my_table", flow="my_flow") }}
ORDER BY
    timestamp DESC

{{ with_test("unique", column="id") }}
{{ with_test("not_empty", column="feedback_text") }}
{{ with_test("not_null", column="timestamp") }}
{{ with_test("date_in_range", column="timestamp", min="2023-01-01", max="2025-12-31") }}

SQL Transform with Component test

This example checks that the record count is 75,471:

component_test.sql
SELECT
    *
FROM
    {{ ref("my_table", flow="my_flow") }}
ORDER BY
    timestamp DESC

{{ with_test("count_equal", count=75471) }}

PySpark Transform with multiple column tests

This example runs three different tests:

Checks for null values in VendorID
Verifies the total row count equals 65,471
Confirms the number of distinct partition UUIDs equals 4

pyspark_native_column_tests.py
from ascend.resources import pyspark, ref, test
from pyspark.sql import DataFrame, SparkSession


@pyspark(
    inputs=[ref("green_cab", reshape={"time": {"column": "lpep_pickup_datetime", "granularity": "month"}})],
    tests=[
        test("not_null", column="VendorID"),
        test("count_equal", count=65471),
        test("count_distinct_equal", column="_ascend_partition_uuid", count=4),
    ],
)
def pyspark_smart_reshape_with_partitioned_sql_input(spark: SparkSession, green_cab: DataFrame, context) -> DataFrame:
    df = green_cab
    return df

Python Transform with Component test

This example checks that the record count is four:

native_column_tests.py
import ascend_project_code.transform as T
import ibis
from ascend.application.context import ComponentExecutionContext
from ascend.resources import ref, test, transform


@transform(
    inputs=[ref("read_sales_stores", flow="extract-load-databricks")],
    materialized="table",
    tests=[
        test("count_equal", count=4),
    ],
)
def sales_stores(read_sales_stores: ibis.Table, context: ComponentExecutionContext) -> ibis.Table:
    sales_stores = T.clean(read_sales_stores)
    return sales_stores

Test Components

Create dedicated Python and SQL Test Components to perform custom test logic on your data.

To learn more about Test Components, refer to our Test Component guide.

note

Test Components differ from native tests because they use custom logic without additional functionality. Their sole purpose is to test data.

SQL Test Component

This SQL Component tests for null values in the name column:

test_ascenders.sql
SELECT
    *
FROM
    {{ ref("ascenders") }}
WHERE
    name IS NULL

Python Test Component

This Python Component verifies that the ascenders table is not empty:

test_ascenders.py
from ascend.resources import TestResult, ref, singular_test


@singular_test(inputs=[ref("ascenders")], severity="error")
def test_ascenders_py(context, ascenders):
    if ascenders.count().to_pyarrow().as_py() > 0:
        return TestResult.empty("test_ascenders_py", True)
    else:
        return TestResult(
            "test_ascenders_py",
            False,
            ascenders,
            "ascenders must be non-empty, please check the data",
        )

Singular tests

Singular tests are one-off Python assertions that run custom logic against your data.

Create singular tests as Python functions decorated with @singular_test that contain custom validation logic and return a TestResult object.

To explore all available options for singular tests, refer to our singular test guide.

Custom range Singular Test

This example tests that all values in the AC column, after filtering and conversion to integers, fall between 0 and 19:

singular_test.py
from ascend.resources import TestResult, ref, singular_test


@singular_test(inputs=[ref("epl_results_by_month", flow="publisher")], severity="error")
def singular_test_python(context, epl_results_by_month):
    filtered = epl_results_by_month.filter(epl_results_by_month.AC != "NA").mutate(ac_int=epl_results_by_month.AC.cast("int")).select("ac_int")
    failed = filtered.filter((filtered.ac_int < 0) | (filtered.ac_int > 19))

    if context.connection.execute(failed.count()) > 0:
        return TestResult(
            "singular_test_python",
            False,
            failed.limit(5),
            "AC values expected to be between 0 and 19",
        )

    return TestResult.empty("singular_test_python", True)

🎉 Congratulations! You just learned all about data validation and quality testing in Ascend.

Prerequisites​

What you'll learn​

Test types​

Native tests​

YAML BigQuery Read Component with Component and column tests:​

SQL Transform with multiple column tests​

SQL Transform with Component test​

PySpark Transform with multiple column tests​

Python Transform with Component test​

Test Components​

SQL Test Component​

Python Test Component​

Singular tests​

Custom range Singular Test​