Skip to main content
Version: 3.0.0

Schema change

In this tutorial, you'll learn how to implement schema change strategies in your Ascend pipelines to avoid schema mismatch errors and make your pipelines robust and resilient.

What you'll learn​

In this tutorial, you'll learn:

  • How to identify and handle schema changes in your data pipelines
  • Different schema change strategies and when to use each one
  • How to implement schema change handling in Read and Transform Components
  • Best practices for managing schema evolution in production environments

Why is schema change handling important?​

Data schemas evolve over time as business requirements change. Without proper schema change handling, your data pipelines can break when:

  • New columns are added to source tables
  • Column data types change
  • Columns are renamed or removed
  • Table structures are reorganized

Ascend's schema change strategies allow your pipelines to adapt to these changes automatically, reducing maintenance overhead and ensuring continuous data flow.

Full refresh​

  • Run Components and Flows in full-refresh mode to resolve schema mismatch errors and completely refresh the Data Plane schema for that Component
  • Full-refresh rebuilds the entire dataset from scratch, ensuring the schema is up-to-date
  • To perform a full refresh on an entire Flow:
    1. Navigate to the Build Info panel and click Run Flow build info
    2. In the Run Flow dialog, scroll down to the Advanced Actions section and check the box for the Full Refresh option
    3. Click Run to execute the entire Flow in full-refresh mode full refresh
  • For targeted schema updates, you can perform a full-refresh on individual Components or a selected subset of Components by using the Component context menu

Schema change strategies​

Ascend Components provide several options for handling schema changes through the on_schema_change parameter:

sync_all_columns​

The "sync_all_columns" strategy synchronizes the target table's columns to exactly match the source schema. When using this strategy:

  • Columns present in the source but missing in the target are added
  • Columns in the target but not in the source are removed
  • Data types of existing columns will be updated to match the source
  • Original column order is preserved for existing columns, with new columns added at the end

This strategy ensures the target always fully reflects the current structure of the source, potentially at the cost of dropping columns that are no longer needed.

append_new_columns​

The "append_new_columns" strategy adds new columns from the source to the target table without modifying existing columns. When using this strategy:

  • New columns detected in the source data are added to the target
  • Existing columns in the target remain unchanged, even if they no longer exist in the source
  • Data type changes to existing columns are not propagated

This approach preserves all current data and structure, simply extending the schema to accommodate new fields as they appear over time.

ignore​

The "ignore" strategy maintains the existing target schema without any modifications:

  • No columns are added, removed, or modified regardless of source schema changes
  • The system will attempt to write data using the existing schema
  • If incompatible data is encountered (e.g., missing required columns), an error will occur
  • This is useful when you want to strictly control schema through other means

fail​

The "fail" strategy provides the strictest approach to schema management:

  • Any schema mismatch between source and target immediately raises an error
  • No automatic schema modifications are permitted
  • Forces manual intervention for any schema changes
  • Useful when you want complete control over schema evolution and data migrations

drop_and_recreate​

The "drop_and_recreate" strategy takes the most aggressive approach to schema changes:

  • Completely drops the existing target table
  • Creates a new table with the exact schema of the source
  • Loads all data with the new schema structure
  • Results in perfect schema alignment with the source
  • May cause data loss if the new schema is incompatible with existing data

This strategy is useful for development environments or when you need a clean slate after major schema changes.

You can specify your chosen schema change strategy in the on_schema_change parameter available in Read and Transform Components.

Examples​

This section demonstrates how to specify a schema change strategy in different types of Components in Ascend. By configuring the on_schema_change parameter correctly, you can ensure your data pipelines remain resilient when source schemas evolve.

Incremental Python Read​

In this example, an Incremental Read Component specifies the "sync_all_columns" schema change strategy in the @read decorator:

read_metabook.py
import polars as pl
import pyarrow as pa
from ascend.application.context import ComponentExecutionContext
from ascend.common.events import log
from ascend.resources import read


@read(
strategy="incremental",
incremental_strategy="merge",
unique_key="id",
on_schema_change="sync_all_columns",
)
def read_metabook(context: ComponentExecutionContext) -> pa.Table:
df = pl.read_parquet("gs://ascend-io-gcs-public/ottos-expeditions/lakev0/generated/events/metabook.parquet/year=*/month=*/day=*/*.parquet")
current_data = context.current_data()
if current_data is not None:
current_data = current_data.to_polars()
max_ts = current_data["timestamp"].max()
log(f"Reading data after {max_ts}")
df = df.filter(df["timestamp"] > max_ts)
else:
log("No current data found, reading all data")

log(f"Returning {df.height} rows")
return df.to_arrow()

Partitioned Local Files Read Component​

This example demonstrates how to configure a MySQL Read Component with explicit schema change handling:

read_partitioned_local.yaml
component:
read:
connection: local_fs
local_file:
path: rc_schema_evolution/
include:
- regex: .*\.csv
parser:
csv:
has_header: true
strategy:
partitioned:
enable_substitution_by_partition_name: false
on_schema_change: append_new_columns

Incremental Python Transform​

The following Python Transform Component uses the on_schema_change parameter to specify how to handle schema changes when running incrementally:

incremental_transform.py
from ascend.resources import ref, transform


@transform(
inputs=[ref("schema_shifting_data")],
materialized="incremental",
incremental_strategy="merge",
unique_key="key",
merge_update_columns=["string", "ts"],
on_schema_change="sync_all_columns",
)
def incremental_transform_schema_evol_python_sync(schema_shifting_data, context):
def _n(x):
return x if "string" in schema_shifting_data else x.upper()

output = schema_shifting_data
if context.is_incremental:
current_data = context.current_data()
output = output[output[_n("ts")] > current_data[_n("ts")].max()]

return output

Incremental SQL Transform​

This SQL Transform example shows how to configure schema change handling in SQL using the config block:

{{
config(
materialized="incremental",
incremental_strategy="merge",
unique_key="key",
merge_update_columns=["string", "ts"],
on_schema_change="sync_all_columns", # Other options: append_new_columns, ignore, fail
)
}}
SELECT * FROM {{ ref("schema_shifting_data") }}

{% if is_incremental() %}
WHERE ts > (SELECT ts FROM {{ this }} ORDER BY ts DESC LIMIT 1)
{% endif %}

Summary​

  • Choose sync_all_columns when you want to ensure target tables always reflect current source structure
  • Use append_new_columns when you want to preserve historical columns while adding new ones
  • Select ignore when you want to maintain existing schema without modifications
  • Use fail when you need strict control over schema changes
  • Apply drop_and_recreate when you need a completely fresh schema alignment and can accept data rebuilding
  • Consider full-refresh when making significant schema changes to ensure data consistency
  • Test schema change strategies in development environments before applying to production
  • Document your schema change strategy choices for each Component to aid in maintenance