Smart schema
Smart schema is Ascend's intelligent approach to handling schema evolution and type reconciliation across different data partitions without requiring data movement when output schemas change. To learn more about the problems that smart schema solves, see the Ascend blog.
We're still working on this one! Expect changes and note that standard SLAs don't apply, so don't rely on this for production workloads yet. Refer to our release stages to learn more.
Support
Currently supported for Read Components using object storage and Local File Connections on DuckDB, Snowflake, and BigQuery Data Planes.
Additional support for other Connection types and Component types will be added in future releases.
How Smart schema works
Traditional schema handling requires copying or moving data whenever the output schema changes, which is inefficient and time-consuming. Smart schema eliminates this overhead through three key mechanisms:
Partition-level schema storage: Each data partition is stored with its own native schema in tables whose names are generated based on the schema structure. This allows different partitions to coexist with varying schemas without immediate reconciliation.
Intelligent output schema computation: After all data has been ingested, Smart schema analyzes all partitions to compute a final output schema that reconciles type differences while preserving maximum information. The system uses sophisticated type reconciliation logic that handles conflicts through configurable policies.
Advanced fingerprinting: The system maintains both metadata and data fingerprints that account for schema and data characteristics. When output schemas change, partition fingerprints are updated accordingly while preserving original metadata fingerprints for efficient data reuse across Flow runs.
Benefits
The key insight that drove our design is deceptively simple: store native, resolve at read time. By refusing to modify data when schemas change, we avoid the fundamental source of complexity, risk, and expense in traditional approaches.
Eliminates data movement: Schema changes no longer require copying or moving existing data partitions, dramatically improving performance for large datasets.
Preserves data integrity: Type reconciliation ensures that schema evolution maintains data accuracy and prevents information loss.
Optimizes storage efficiency: Particularly beneficial for DuckDB Data Planes using DuckLake, where the traditional approach of using temporary tables for intermediate storage creates bottlenecks.
Enables seamless backfill: Smart schema makes backfill operations more efficient by allowing different historical partitions to maintain their original schemas until final reconciliation.
Next steps
Learn how to implement schema change strategies in your Ascend pipelines, including how to configure Smart schema for optimal performance.