Build a templatized data pipeline
Applications are powerful, reusable groups of derived Components bundled into a single visual unit within your Flow Graph. They allow you to package complex logic into modular, reusable building blocks that simplify your data pipelines.
This guide shows you how to create reusable data transformations using Ascend's Simple Application. The Simple Application is a pre-built Ascend solution that enables you to define parameterized Transforms with Jinja templating, creating flexible and maintainable data pipelines.
Want to learn more about Ascend-native Applications? Check out this concept guide.
Build a Simple Application
The Simple Application uses SQL or Python templates (or both) with Jinja to create parameterized Components. This guide demonstrates both approaches.
- SQL templates
- SQL and Python templates
This implementation uses SQL templates to build a straightforward filtering Application.
Set up your Project structure
Create a template directory structure in your Project to organize your SQL templates:
my_project/
├── templates/
│ └── flows/
│ └── basic_filter/
│ └── filter.sql
Create your SQL template
Add a .sql
file with placeholders for parameters:
SELECT *
FROM {{ ref(input_table) }}
WHERE {{ filter_column }} > {{ filter_value }}
LIMIT {{ row_limit }}
Configure the Application
Create a YAML configuration file that references the SQL template and defines parameter values:
component:
application:
application_id: simple # Simple Application
config:
template_path: templates/flows/basic_filter # Directory with your templates
parameters:
input_table: YOUR_INPUT_TABLE
filter_column: YOUR_COLUMN_NAME
filter_value: 100
row_limit: 1000
When Ascend processes this configuration, it generates a SQL Transform Component that filters your data based on the specified parameters.
This implementation uses both SQL and Python templates to build a data diff Application that identifies differences between two tables.
Set up your Project structure
Create a template directory structure in your Project to organize your templates:
my_project/
├── templates/
│ └── flows/
│ └── data_diff/
│ ├── a.sql
│ ├── b.sql
│ ├── diff.py
│ └── flagged.py
Create your templates
Add each template file with the following content:
First table query template (a.sql
)
This template retrieves data from the first table with any configured filters for comparison:
SELECT
*
FROM
{{ ref(a.get("name"), flow=a.get("flow")) }}
{%- if where %}
WHERE
{{ where }}
{%- endif %}
Second table query template (b.sql
)
This template retrieves data from the second table with any configured filters for comparison:
SELECT
*
FROM
{{ ref(b.get("name"), flow=b.get("flow")) }}
{%- if where %}
WHERE
{{ where }}
{%- endif %}
Data comparison logic (diff.py
)
This template contains the core comparison logic that:
- Dynamically compares schemas of two input tables
- Joins on all string/timestamp columns and selects all columns from both sides
- Raises an error if schemas don't match
import ibis
from ascend.application.context import ComponentExecutionContext
from ascend.common.events import log
from ascend.resources import application_subcomponent_name, application_subcomponent_ref, transform
from ibis import ir
@transform(
name=application_subcomponent_name("diff"),
inputs=[
application_subcomponent_ref("a", alias="a"),
application_subcomponent_ref("b", alias="b"),
],
)
def diff(a: ir.Table, b: ir.Table, context: ComponentExecutionContext) -> ir.Table:
Filtered results (flagged.py
)
This template filters the output from diff.py
, keeping only rows where the DIFF_SCORE
(representing the degree of difference for each row) exceeds a configurable threshold:
from ascend.application.context import ComponentExecutionContext
from ascend.common.events import log
from ascend.resources import (
application_subcomponent_name,
application_subcomponent_ref,
transform,
)
from ibis import ir
@transform(
name=application_subcomponent_name("flagged"),
inputs=[
application_subcomponent_ref("diff"),
],
Configure the Application
Create a YAML configuration file that references the templates and defines parameter values:
component:
application:
application_id: simple # Simple Application
config:
template_path: templates/flows/data_diff # Directory containing all templates
parameters:
a:
flow: data_quality
name: builder_sessions
alias: table_1
b:
name: simple__builder_sessions
alias: table_2
where: | # Date filter
TO_DATE(session_start_time) IN ('2022-10-15', '2025-03-01')
When Ascend processes this configuration, it generates SQL and Python Transforms that compare your tables based on the specified parameters.
🎉 Congratulations! You've successfully created a Simple Application that can be reused across different datasets and scenarios.