Reshape
The reshape specification defines how input data should be partitioned and processed in Smart operations. It determines the granularity and strategy for data processing across different time periods or partitions.
Overview​
Reshape functionality is essential for optimizing data processing workflows by controlling how input Components' data is partitioned and aggregated. This is particularly important in time-series data processing where you need to process data in specific time windows or granularities.
Reshape options​
The reshape parameter accepts the following options:
String values​
Value | Description |
---|---|
"full" | Processes all available data as a single partition. Use this when you need complete data aggregation across all time periods. |
"map" | Processes data partition-wise, maintaining the existing partition structure. Use this for efficient parallel processing where each partition can be processed independently. |
Time-based reshape​
For more sophisticated time-based partitioning, you can specify a time-based reshape configuration that allows you to control exactly how data is partitioned by time.
Time-based reshape configuration​
Reshape Specification​
The reshape specification is defined using a nested structure that allows you to reshape input Component data by time periods.
reshape = {
"time": {
"column": "timestamp_column_name",
"granularity": "day"
}
}
Time-based reshape structure​
Property | Type | Required | Description |
---|---|---|---|
time | TimeBasedReshapeOptions | No | Options for reshaping the input Component data by time |
Time-based reshape options​
The time
property accepts a TimeBasedReshapeOptions
object with the following properties:
Property | Type | Required | Description |
---|---|---|---|
column | string | Yes | Timestamp column to reshape by |
granularity | string | Yes | Time granularity to use for reshaping |
Usage in ref() function​
The reshape specification is commonly used in the ref()
function when referencing Components:
# Using string reshape
ref("my_table", reshape="map")
ref("my_table", reshape="full")
# Using time-based reshape
ref("my_table", reshape={
"time": {
"column": "DateTime",
"granularity": "day"
}
})
Implementation notes​
- The
reshape
parameter takes precedence over deprecated parameters likepartition_by
andmap_partitions
- When using time-based reshape, the system internally converts this to a repartition specification with the specified column and granularity
- Invalid reshape specifications will raise validation errors during processing
Example​
-- Example SQL using reshape in ref()
SELECT * FROM {{ ref("source_table", reshape={"time": {"column": "event_time", "granularity": "hour"}}) }}
WHERE event_time >= '2023-01-01'