Skip to main content
Version: 3.0.0

Create a SQL Transform

Prerequisites

Before you begin, ensure you have the following:

Introduction

In this guide, we will walk through the process of transforming your data using SQL in Ascend with a Transform Component.

Step 1: Navigate to your Flow folder.

  1. From your Ascend Instance Homepage, click on your Workspace to open it.
  2. From here, click on the Files icon on the left navigation bar. This will open the File Tree in the left panel.
  3. In your File Tree, locate the Ascend project where you would like to create the Transform Component.
  4. Within the project folder, navigate into the flows folder, and then into the folder for the flow you would like to add the SQL transform to.
  5. Right-click on the components folder and select New file.
  6. Provide the file with a name and a .sql.jinja extension, such as my_transform.sql.jinja. After creating the file, your project structure should look similar to the example below:
your_project_name
├── ascend_project.yaml
├── connections
├── flows
│ └── your_flow_name
│ └── components
│ ├── all_cabs.yaml
│ ├── my_transform.sql.jinja
│ └── taxi_zone_lookup.yaml
├── profiles
└── vaults

Step 2: Build your transform

  1. After naming the file, an empty file will open in the build panel. This is where you will define your SQL. Below is an example of a SQL transform.
my_project/flows/my_flow/components/my_transform.sql.jinja
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

This should look similar to regular SQL, with the exception of the ref function. We use the {{ ref('component_name') }} syntax to reference our input components. In this example, we are referencing two input components: all_cabs.yaml and taxi_zone_lookup.yaml. The input components can be either a Read Component or another Transform Component.

Step 3: Configuring Advanced Options

Repartitioning

By default, your transformation logic will run on the entire upstream dataset in a single Ascend partition. However, you can choose to repartition your data into time-based Ascend partitions that will be processed separately. We do this by adding the partition_by and granularity options to the ref function.

For example, in our transform, if we wanted to repartition the all_cabs input by day on the pickup_datetime column, we would update the ref function by adding partition_by='pickup_datetime', granularity='day'.

my_project/flows/my_flow/components/my_transform.sql.jinja
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
/* Partition daily by pickup-datetime column */
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

When this transform runs, Ascend will first partition the input, all_cabs, generating a partition per day on the pickup_datetime column. Ascend will then run the transformation on each partition in parallel. To learn more about partitioning options in Ascend, see our guide on advanced partitioning with transforms, as well as our incremental transform guide.

Schema change options

You can also define the behavior you want when the schema of the transform changes. This can be done by adding the on_schema_change option to the config block at the beginning of the transform. For example, if we want to append new columns to the output schema when the input schema changes, we can add on_schema_change='append_new_columns' to the config block.

my_project/flows/my_flow/components/my_transform.sql.jinja
/* Add on_schema_change option*/
{{
config(
on_schema_change='append_new_columns'
)
}}

SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

The schema changes options available to you are listed here.

Data Quality Tests

Finally, you can define data quality tests on the output of your transform by adding the with_test function to your transform. In our example, we have added two tests:

  1. not_null on the vendor_id column to ensure there are no null values in this column
  2. count_equal to ensure the output has exactly 2912193 rows
my_project/flows/my_flow/components/my_transform.sql.jinja
{{
config(
on_schema_change='append_new_columns'
)
}}

SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

/* Add data quality checks */
{{ with_test("not_null", column="vendor_id") }}
{{ with_test("count_equal", count=2912193) }}

To see what other tests are available, check out our Reference Guide.

Next Steps