Skip to main content

Test YAML

In this guide, you'll learn how to add data quality and validation tests to your YAML Components to ensure data integrity in your pipelines.

Prerequisites

info

For a comprehensive overview of test types and when to use them, see our Tests concept guide.

Test behavior

Tests accept a severity parameter that can be set to error or warn.

error is the default severity, meaning that failed tests cause the entire Component to fail. To log warnings instead of failing, set severity: warn:

columns:
id:
- not_null:
severity: warn

Test format

YAML Components support column-level, Component-level, and schema tests through a dedicated tests section:

component:
# ... component configuration ...
tests:
component:
- test_name:
parameter: value
columns:
column_name:
- test_name
- test_name_with_params:
parameter: value
schema:
match: exact
columns:
id: int
name: string

Column-level tests

Column-level tests validate individual columns in your dataset.

Basic validation

Check for null values, empty strings, and uniqueness:

tests:
columns:
user_id:
- not_null
- not_empty
- unique

Numeric range tests

Validate that numeric values fall within expected ranges:

tests:
columns:
age:
- in_range:
min: 0
max: 120
price:
- greater_than:
value: 0
- less_than_or_equal:
value: 1000000

Available comparison tests:

  • greater_than: Values strictly greater than threshold
  • less_than: Values strictly less than threshold
  • greater_than_or_equal: Values greater than or equal to threshold
  • less_than_or_equal: Values less than or equal to threshold
  • in_range: Values within min/max range (inclusive)

Date range tests

Validate date columns fall within expected ranges:

tests:
columns:
created_at:
- date_in_range:
min: "2023-01-01"
max: "2024-12-31"

Set membership tests

Validate that values belong to an allowed set:

tests:
columns:
status:
- in_set:
values:
- pending
- approved
- rejected
country_code:
- in_set:
values: [US, CA, MX, UK]

String pattern tests

Validate string content:

tests:
columns:
email:
- substring_match:
substring: "@"

Statistical tests

Validate statistical properties of numeric columns:

tests:
columns:
temperature:
- mean_in_range:
min: 60
max: 80
- stddev_in_range:
min: 0
max: 15

Distinct count tests

Validate the number of distinct values:

tests:
columns:
category:
- count_distinct_equal:
count: 5
region:
- count_distinct_equal:
count: 4
group_by_columns:
- country

Component-level tests

Component-level tests validate the entire output of your Component.

Row count tests

Verify exact or bounded row counts:

tests:
component:
- count_equal:
count: 1000
- count_greater_than:
count: 0
- count_less_than:
count: 1000000

Available count tests:

  • count_equal: Exactly N rows
  • count_greater_than: More than N rows
  • count_less_than: Fewer than N rows
  • count_greater_than_or_equal: At least N rows
  • count_less_than_or_equal: At most N rows

Grouped count tests

Validate row counts within groups:

tests:
component:
- count_greater_than:
count: 10
group_by_columns:
- region
- product_category

Combination uniqueness tests

Validate that combinations of columns are unique:

tests:
component:
- combination_unique:
columns:
- order_id
- line_item_id

Schema tests

Schema tests validate the structure and data types of your Component output:

tests:
schema:
match: exact
columns:
id: int
name: string
price: double
created_at: timestamp

The match parameter controls validation behavior:

  • exact: All columns must match exactly (no extra columns allowed)
  • ignore_missing: Only validates listed columns; extra columns are allowed
tests:
schema:
match: ignore_missing
columns:
id: int
name: string

Custom SQL tests

Create reusable custom tests by defining SQL test macros. Custom tests return rows that fail the validation.

Define a custom test

Create a SQL file with the test definition:

tests/macros/custom_test.sql
{% test valid_email(component, column) %}
SELECT *
FROM {{ component }}
WHERE {{ column }} NOT LIKE '%@%.%'
{% endtest %}

Use custom tests

Reference custom tests in your Component:

tests:
columns:
email:
- valid_email

Parameterized custom tests

Add parameters to your custom tests:

tests/macros/value_in_list.sql
{% test value_in_list(component, column, allowed_values) %}
SELECT *
FROM {{ component }}
WHERE {{ column }} NOT IN ({{ allowed_values | join(', ') }})
{% endtest %}
tests:
columns:
status:
- value_in_list:
allowed_values:
- "'active'"
- "'inactive'"

Complete example

Here's a comprehensive example demonstrating multiple test types:

orders_validated.yaml
component:
read:
connection: warehouse
snowflake:
table: raw_orders
tests:
schema:
match: exact
columns:
order_id: string
customer_id: string
amount: double
status: string
created_at: timestamp
component:
- count_greater_than:
count: 0
- combination_unique:
columns:
- order_id
columns:
order_id:
- not_null
- not_empty
customer_id:
- not_null
amount:
- not_null
- greater_than:
value: 0
- less_than:
value: 1000000
- mean_in_range:
min: 50
max: 500
severity: warn
status:
- not_null
- in_set:
values:
- pending
- processing
- completed
- cancelled
created_at:
- not_null
- date_in_range:
min: "2020-01-01"
max: "2030-12-31"

Test reference

Column tests

TestDescriptionParameters
not_nullNo NULL valuesNone
not_emptyNo empty stringsNone
uniqueAll values uniqueNone
in_rangeNumeric values within rangemin, max
date_in_rangeDates within rangemin, max
in_setValues in allowed setvalues
greater_thanValues greater than thresholdvalue
less_thanValues less than thresholdvalue
greater_than_or_equalValues greater than or equal to thresholdvalue
less_than_or_equalValues less than or equal to thresholdvalue
substring_matchContains substringsubstring
mean_in_rangeMean within rangemin, max
stddev_in_rangeStandard deviation within rangemin, max
count_distinct_equalDistinct count equalscount, group_by_columns (optional)

Component tests

TestDescriptionParameters
count_equalExact row countcount
count_greater_thanRows greater than thresholdcount, group_by_columns (optional)
count_less_thanRows less than thresholdcount, group_by_columns (optional)
count_greater_than_or_equalRows greater than or equal to thresholdcount, group_by_columns (optional)
count_less_than_or_equalRows less than or equal to thresholdcount, group_by_columns (optional)
combination_uniqueColumn combination uniquecolumns

Next steps