Skip to main content

Test YAML

In this guide, you'll learn how to add data quality and validation tests to your YAML Components to ensure data integrity in your pipelines.

Prerequisites​

info

For a comprehensive overview of test types and when to use them, see our Tests concept guide.

Test format​

YAML Components support both column-level and Component-level native tests through a dedicated tests section. These tests validate data quality automatically during pipeline execution.

Add native tests to YAML Components by specifying a tests section with component and columns subsections:

component:
# ... component configuration ...
tests:
component:
- test_name:
parameter: value
columns:
column_name:
- test_name
- test_name_with_params:
parameter: value

Component-level tests​

Component-level tests validate the entire output of your Component. Common use cases include verifying record counts and schema validation.

Verify that your Component produces exactly the expected number of records:

component:
read:
connection: my_connection
# ... read configuration ...
tests:
component:
- count_equal:
count: 1000

Column-level tests​

Column-level tests validate individual columns in your dataset. Specify tests under the column name in the columns section.

Basic validation tests​

Check for null values and empty strings:

component:
read:
connection: my_connection
# ... read configuration ...
tests:
columns:
user_id:
- not_null
- not_empty
email:
- not_null
- unique

Value range tests​

Validate that numeric columns fall within expected ranges:

component:
read:
connection: my_connection
# ... read configuration ...
tests:
columns:
age:
- greater_than:
value: 0
- less_than:
value: 120

Complete example​

Here's a comprehensive example of a GCS Read Component with both Component-level and column-level tests:

read_bigquery_with_tests.yaml
component:
read:
connection: read_gcs_lake
gcs:
path: ottos-expeditions/lakev0/generated/events/sales_store.parquet/year=
include:
- glob: "*/month=*/day=*/*.parquet"
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2
id:
- unique
- not_null
price:
- greater_than:
value: 0
- less_than:
value: 1000000
timestamp:
- not_null
- greater_than:
value: "'2023-08-25 00:00:00'"
- less_than:
value: "'2023-08-27 00:00:00'"
ascender_id:
- not_null
- not_empty

This example demonstrates:

  • Component test: Verifies exactly 8 records are returned
  • Partition validation: Ensures partition UUIDs are present and exactly 2 distinct values exist
  • Data integrity: Validates transaction IDs are unique and not null
  • Business rules: Ensures transaction amounts are positive and reasonable
  • Time bounds: Confirms timestamps fall within the expected date range
  • Contact information: Validates customer emails are present and not empty

Next steps​