Skip to main content
Version: 3.0.0

Create a Simple SQL Transform

In this guide, we'll build a Simple SQL Transform that efficiently processes entire datasets using standard SQL.

Prerequisites​

Create a Transform​

You can create a Transform in two ways: through the form UI or directly in the Files panel.

  1. Double-click the Flow where you want to add your Transform
  2. Right-click on an existing component (typically a Read component or another Transform) that will provide input data
  3. Select Create Downstream → Transform Creating a Transform from the context menu
  4. Complete the form with these details:
    • Select your Flow
    • Enter a descriptive name for your Transform (e.g., sales_aggregation)
    • Choose the appropriate file type for your Transform logic Transform creation form

Create your Simple SQL Transform​

Simple SQL Transforms let you apply familiar SQL operations to your data while leveraging Ascend's powerful orchestration capabilities. These transforms execute standard SQL queries against referenced input components.

Basic structure​

Structure your SQL code using the ref function to reference upstream data sources, which can be either Read Components or other Transform Components.

Minimal example​

basic.sql
SELECT * FROM {{ ref("<YOUR-UPSTREAM-COMPONENT-NAME>") }}

Practical example​

simple.sql
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

Advanced Transform features​

Ascend offers powerful features that can significantly improve your data pipeline's performance, reliability, and maintainability.

Data partitioning​

By default, transforms process the entire dataset as a single unit. However, you can dramatically improve performance by partitioning your data into smaller chunks that process in parallel. Enable this by adding partition_by and granularity parameters to the ref function.

For example, to process taxi trip data by day:

repartitioning.sql
SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
/* Partition daily by pickup-datetime column */
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

When executed, Ascend partitions the input data by day based on the pickup_datetime column and processes each partition concurrently, significantly accelerating processing for large datasets.

For more on partitioning strategies, see our guides on advanced partitioning and incremental transforms.

Schema evolution handling​

You can define how your transform adapts to schema changes by adding the on_schema_change option to the config block. This is particularly valuable in environments where source data schemas evolve frequently.

This example shows how to automatically incorporate new columns:

schema_change.sql
/* Add on_schema_change option*/
{{
config(
on_schema_change='append_new_columns'
)
}}

SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

Data validation testing​

Integrate automated data quality checks using the with_test function to validate transform outputs. These tests execute whenever the transform runs, ensuring data meets your quality requirements.

Here we validate that:

  1. The vendor_id column contains no null values
  2. The output contains exactly 2,912,193 rows
tests.sql
{{
config(
on_schema_change='append_new_columns'
)
}}

SELECT
cabs.*,
pickup_zones.Borough AS pickup_borough,
pickup_zones.Zone AS pickup_zone,
pickup_zones.service_zone AS pickup_service_zone,
dropoff_zones.Borough AS dropoff_borough,
dropoff_zones.Zone AS dropoff_zone,
dropoff_zones.service_zone AS dropoff_service_zone
FROM {{ ref('all_cabs', partition_by='pickup_datetime', granularity='day') }} AS cabs
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS pickup_zones
ON cabs.pickup_location_id = pickup_zones.LocationID
LEFT OUTER JOIN {{ ref('taxi_zone_lookup') }} AS dropoff_zones
ON cabs.dropoff_location_id = dropoff_zones.LocationID

/* Add data quality checks */
{{ with_test("not_null", column="vendor_id") }}
{{ with_test("count_equal", count=2912193) }}

For a complete list of available tests, see our data quality testing reference.

🎉 Congratulations! You just created your first Simple SQL Transform in Ascend!