Test YAML
In this guide, you'll learn how to add data quality and validation tests to your YAML Components to ensure data integrity in your pipelines.
Prerequisites​
- Ascend Flow
For a comprehensive overview of test types and when to use them, see our Tests concept guide.
Test format​
YAML Components support both column-level and Component-level native tests through a dedicated tests
section. These tests validate data quality automatically during pipeline execution.
Add native tests to YAML Components by specifying a tests
section with component
and columns
subsections:
component:
# ... component configuration ...
tests:
component:
- test_name:
parameter: value
columns:
column_name:
- test_name
- test_name_with_params:
parameter: value
Component-level tests​
Component-level tests validate the entire output of your Component. Common use cases include verifying record counts and schema validation.
Verify that your Component produces exactly the expected number of records:
component:
read:
connection: my_connection
# ... read configuration ...
tests:
component:
- count_equal:
count: 1000
Column-level tests​
Column-level tests validate individual columns in your dataset. Specify tests under the column name in the columns
section.
Basic validation tests​
Check for null values and empty strings:
component:
read:
connection: my_connection
# ... read configuration ...
tests:
columns:
user_id:
- not_null
- not_empty
email:
- not_null
- unique
Value range tests​
Validate that numeric columns fall within expected ranges:
component:
read:
connection: my_connection
# ... read configuration ...
tests:
columns:
age:
- greater_than:
value: 0
- less_than:
value: 120
Complete example​
Here's a comprehensive example of a GCS Read Component with both Component-level and column-level tests:
component:
read:
connection: read_gcs_lake
gcs:
path: ottos-expeditions/lakev0/generated/events/sales_store.parquet/year=
include:
- glob: "*/month=*/day=*/*.parquet"
tests:
component:
- count_equal:
count: 8
columns:
_ascend_partition_uuid:
- not_null
- not_empty
- count_distinct_equal:
count: 2
id:
- unique
- not_null
price:
- greater_than:
value: 0
- less_than:
value: 1000000
timestamp:
- not_null
- greater_than:
value: "'2023-08-25 00:00:00'"
- less_than:
value: "'2023-08-27 00:00:00'"
ascender_id:
- not_null
- not_empty
This example demonstrates:
- Component test: Verifies exactly 8 records are returned
- Partition validation: Ensures partition UUIDs are present and exactly 2 distinct values exist
- Data integrity: Validates transaction IDs are unique and not null
- Business rules: Ensures transaction amounts are positive and reasonable
- Time bounds: Confirms timestamps fall within the expected date range
- Contact information: Validates customer emails are present and not empty
Next steps​
- Learn about SQL tests
- Explore Python tests
- Review the Tests concept guide