Skip to main content

Test YAML

In this guide, you'll learn how to add data quality and validation tests to your YAML Components to ensure data integrity in your pipelines.

Prerequisites

info

For a comprehensive overview of test types and when to use them, see our Tests concept guide.

Test format

YAML Components support both column-level and Component-level native tests through a dedicated tests section. These tests validate data quality automatically during pipeline execution.

Add native tests to YAML Components by specifying a tests section with component and columns subsections:

component:
# ... component configuration ...
tests:
component:
- test_name:
parameter: value
columns:
column_name:
- test_name
- test_name_with_params:
parameter: value

Component-level tests

Component-level tests validate the entire output of your Component. Common use cases include verifying record counts and schema validation.

Verify that your Component produces exactly the expected number of records:

component:
read:
connection: my_connection
# ... read configuration ...
tests:
component:
- count_equal:
count: 1000

Column-level tests

Column-level tests validate individual columns in your dataset. Specify tests under the column name in the columns section.

Basic validation tests

Check for null values and empty strings:

component:
read:
connection: my_connection
# ... read configuration ...
tests:
columns:
user_id:
- not_null
- not_empty
email:
- not_null
- unique

Value range tests

Validate that numeric columns fall within expected ranges:

component:
read:
connection: my_connection
# ... read configuration ...
tests:
columns:
age:
- greater_than:
value: 0
- less_than:
value: 120

Complete example

Here's a comprehensive example of a GCS Read Component with both Component-level and column-level tests:

read_bigquery_with_tests.yaml
component:
read:
connection: read_gcs_lake
gcs:
path: ottos-expeditions/lakev0/generated/events/sales_store.parquet/year=
include:
- glob: "*/month=*/day=*/*.parquet"
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2
id:
- unique
- not_null
price:
- greater_than:
value: 0
- less_than:
value: 1000000
timestamp:
- not_null
- greater_than:
value: "'2023-08-25 00:00:00'"
- less_than:
value: "'2023-08-27 00:00:00'"
ascender_id:
- not_null
- not_empty

This example demonstrates:

  • Component test: Verifies exactly 8 records are returned
  • Partition validation: Ensures partition UUIDs are present and exactly 2 distinct values exist
  • Data integrity: Validates transaction IDs are unique and not null
  • Business rules: Ensures transaction amounts are positive and reasonable
  • Time bounds: Confirms timestamps fall within the expected date range
  • Contact information: Validates customer emails are present and not empty

Next steps