Version: 3.0.0

Create a SQL Transform

Prerequisites

Before you begin, ensure you have the following:

An existing Ascend project (see Create a Project)
An existing Ascend Workspace (see Setting Up a Workspace)
An Ascend flow with at least one Read Component

Introduction

In this guide, we will walk through the process of transforming your data using SQL in Ascend with a Transform Component.

Step 1: Navigate to your Flow folder.

From your Ascend Instance Homepage, click on your Workspace to open it.
From here, click on the Files icon on the left navigation bar. This will open the File Tree in the left panel.
In your File Tree, locate the Ascend project where you would like to create the Transform Component.
Within the project folder, navigate into the flows folder, and then into the folder for the flow you would like to add the SQL transform to.
Right-click on the components folder and select New file.
Provide the file with a name and a .sql.jinja extension, such as my_transform.sql.jinja. After creating the file, your project structure should look similar to the example below:

your_project_name
├── ascend_project.yaml
├── connections
├── flows
│   └── your_flow_name
│       └── components
│           ├── all_cabs.yaml
│           ├── my_transform.sql.jinja
│           └── taxi_zone_lookup.yaml
├── profiles
└── vaults

Step 2: Build your transform

After naming the file, an empty file will open in the build panel. This is where you will define your SQL. Below is an example of a SQL transform.

my_project/flows/my_flow/components/my_transform.sql.jinja
SELECT
  cabs.*,
  pickup_zones.Borough AS pickup_borough,
  pickup_zones.Zone AS pickup_zone,
  pickup_zones.service_zone AS pickup_service_zone,
  dropoff_zones.Borough AS dropoff_borough,
  dropoff_zones.Zone AS dropoff_zone,
  dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

This should look similar to regular SQL, with the exception of the ref function. We use the {{ ref('component_name') }} syntax to reference our input components. In this example, we are referencing two input components: all_cabs.yaml and taxi_zone_lookup.yaml. The input components can be either a Read Component or another Transform Component.

Step 3: Configuring Advanced Options

Repartitioning

By default, your transformation logic will run on the entire upstream dataset in a single Ascend partition. However, you can choose to repartition your data into time-based Ascend partitions that will be processed separately. We do this by adding the partition_by and granularity options to the ref function.

For example, in our transform, if we wanted to repartition the all_cabs input by day on the pickup_datetime column, we would update the ref function by adding partition_by='pickup_datetime', granularity='day'.

my_project/flows/my_flow/components/my_transform.sql.jinja
SELECT
  cabs.*,
  pickup_zones.Borough AS pickup_borough,
  pickup_zones.Zone AS pickup_zone,
  pickup_zones.service_zone AS pickup_service_zone,
  dropoff_zones.Borough AS dropoff_borough,
  dropoff_zones.Zone AS dropoff_zone,
  dropoff_zones.service_zone AS dropoff_service_zone
/* Partition daily by pickup-datetime column */
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

When this transform runs, Ascend will first partition the input, all_cabs, generating a partition per day on the pickup_datetime column. Ascend will then run the transformation on each partition in parallel. To learn more about partitioning options in Ascend, see our guide on advanced partitioning with transforms, as well as our incremental transform guide.

Schema change options

You can also define the behavior you want when the schema of the transform changes. This can be done by adding the on_schema_change option to the config block at the beginning of the transform. For example, if we want to append new columns to the output schema when the input schema changes, we can add on_schema_change='append_new_columns' to the config block.

my_project/flows/my_flow/components/my_transform.sql.jinja
/* Add on_schema_change option*/
{{
  config(
    on_schema_change='append_new_columns'
  )
}}

SELECT
  cabs.*,
  pickup_zones.Borough AS pickup_borough,
  pickup_zones.Zone AS pickup_zone,
  pickup_zones.service_zone AS pickup_service_zone,
  dropoff_zones.Borough AS dropoff_borough,
  dropoff_zones.Zone AS dropoff_zone,
  dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

The schema changes options available to you are listed here.

Data Quality Tests

Finally, you can define data quality tests on the output of your transform by adding the with_test function to your transform. In our example, we have added two tests:

not_null on the vendor_id column to ensure there are no null values in this column
count_equal to ensure the output has exactly 2912193 rows

my_project/flows/my_flow/components/my_transform.sql.jinja
{{
  config(
    on_schema_change='append_new_columns'
  )
}}

SELECT
  cabs.*,
  pickup_zones.Borough AS pickup_borough,
  pickup_zones.Zone AS pickup_zone,
  pickup_zones.service_zone AS pickup_service_zone,
  dropoff_zones.Borough AS dropoff_borough,
  dropoff_zones.Zone AS dropoff_zone,
  dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

/* Add data quality checks */
{{ with_test("not_null", column="vendor_id") }}
{{ with_test("count_equal", count=2912193) }}

To see what other tests are available, check out our Reference Guide.

Next Steps

Learn how to run your flow and view the output data.
Learn the difference between Partitioned and Incremental transforms and how to use them to best suit your needs.
Learn how to add parameters to your transform to enable dynamic behavior across flows.

Prerequisites​

Introduction​

Step 1: Navigate to your Flow folder.​

Step 2: Build your transform​

Step 3: Configuring Advanced Options​

Repartitioning​

Schema change options​

Data Quality Tests​

Next Steps​