Create a SQL Transform
Prerequisites
Before you begin, ensure you have the following:
- An existing Ascend project (see Create a Project)
- An existing Ascend Workspace (see Setting Up a Workspace)
- An Ascend flow with at least one Read Component
Introduction
In this guide, we will walk through the process of transforming your data using SQL in Ascend with a Transform Component.
Step 1: Navigate to your Flow folder.
- From your Ascend Instance Homepage, click on your Workspace to open it.
- From here, click on the Files icon on the left navigation bar. This will open the File Tree in the left panel.
- In your File Tree, locate the Ascend project where you would like to create the Transform Component.
- Within the project folder, navigate into the
flows
folder, and then into the folder for the flow you would like to add the SQL transform to. - Right-click on the
components
folder and select New file. - Provide the file with a name and a
.sql.jinja
extension, such asmy_transform.sql.jinja
. After creating the file, your project structure should look similar to the example below:
your_project_name
├── ascend_project.yaml
├── connections
├── flows
│ └── your_flow_name
│ └── components
│ ├── all_cabs.yaml
│ ├── my_transform.sql.jinja
│ └── taxi_zone_lookup.yaml
├── profiles
└── vaults
Step 2: Build your transform
- After naming the file, an empty file will open in the build panel. This is where you will define your SQL. Below is an example of a SQL transform.
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID
This should look similar to regular SQL, with the exception of the ref
function. We use the {{ ref('component_name') }}
syntax to reference our input components. In this example, we are referencing two input components: all_cabs.yaml
and taxi_zone_lookup.yaml
. The input components can be either a Read Component or another Transform Component.
Step 3: Configuring Advanced Options
Repartitioning
By default, your transformation logic will run on the entire upstream dataset in a single Ascend partition. However, you can choose to repartition your data into time-based Ascend partitions that will be processed separately. We do this by adding the partition_by
and granularity
options to the ref
function.
For example, in our transform, if we wanted to repartition the all_cabs
input by day on the pickup_datetime
column, we would update the ref
function by adding partition_by='pickup_datetime', granularity='day'
.
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
/* Partition daily by pickup-datetime column */
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID
When this transform runs, Ascend will first partition the input, all_cabs
, generating a partition per day on the pickup_datetime
column. Ascend will then run the transformation on each partition in parallel.
To learn more about partitioning options in Ascend, see our guide on advanced partitioning with transforms, as well as our incremental transform guide.
Schema change options
You can also define the behavior you want when the schema of the transform changes. This can be done by adding the on_schema_change
option to the config
block at the beginning of the transform.
For example, if we want to append new columns to the output schema when the input schema changes, we can add on_schema_change='append_new_columns'
to the config
block.
/* Add on_schema_change option*/
{{
config(
on_schema_change='append_new_columns'
)
}}
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID
The schema changes options available to you are listed here.
Data Quality Tests
Finally, you can define data quality tests on the output of your transform by adding the with_test
function to your transform. In our example, we have added two tests:
not_null
on thevendor_id
column to ensure there are no null values in this columncount_equal
to ensure the output has exactly 2912193 rows
{{
config(
on_schema_change='append_new_columns'
)
}}
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID
/* Add data quality checks */
{{ with_test("not_null", column="vendor_id") }}
{{ with_test("count_equal", count=2912193) }}
To see what other tests are available, check out our Reference Guide.
Next Steps
- Learn how to run your flow and view the output data.
- Learn the difference between Partitioned and Incremental transforms and how to use them to best suit your needs.
- Learn how to add parameters to your transform to enable dynamic behavior across flows.