Skip to main content
Version: 3.0.0

SQL Transform

SQL transforms execute SQL queries to transform data.

SqlTransform

info

SqlTransform is defined beneath the following ancestor nodes in the YAML structure:

Below are the properties for the SqlTransform. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[InputComponent]
NoList of input names to use in the SQL query.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy - incremental, partitioned, or view/table.
sqlstring
NoSQL query to execute for transforming the data.
dialectstring
NoSQL dialect to use for the query. Set to 'None' for the data plane's default dialect, or 'spark' for Spark SQL.

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

PropertyDefaultTypeRequiredDescription
componentOne of:
  ReadComponent
  TransformComponent
  TaskComponent
  SingularTestComponent
  CustomPythonReadComponent
  WriteComponent
  CompoundComponent
  AliasedTableComponent
  ExternalTableComponent
YesConfiguration options for the component.

TransformComponent

A component that executes SQL or Python code to transform data.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    SynapseDataPlane
    FabricDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
descriptionstring
NoA brief description of what the model does.
metadataResourceMetadata
NoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoThe name of the flow that the component belongs to.
skipboolean
NoA boolean flag indicating whether to skip processing for the component or not.
data_maintenanceDataMaintenance
NoThe data maintenance configuration options for the component.
testsComponentTestOptions
NoDefines tests to run on the data of this component.
transformOne of:
  SqlTransform
  PythonTransform
  SnowparkTransform
  PySparkTransform
YesThe transform component that executes SQL or Python code to transform data.

IncrementalStrategy

Incremental Processing Strategy.

PropertyDefaultTypeRequiredDescription
incrementalAny of:
  string
  MergeStrategy
  SCDType2Strategy
YesIncremental processing strategy.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedStrategy

Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.

PropertyDefaultTypeRequiredDescription
partitionedPartitionedOptions
NoOptions for partitioning data.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedOptions

Options related to partition optimization - in particular, the policy that determines which partitions to ingest.

PropertyDefaultTypeRequiredDescription
enable_substitution_by_partition_namebooleanYesEnable substitution by partition name.
output_typetablestring ("table", "view")NoOutput type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms.

SCDType2Strategy

The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.

PropertyDefaultTypeRequiredDescription
scd_type_2KeyOptions
NoOptions for SCD Type 2 strategy.

InputComponent

Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.

PropertyDefaultTypeRequiredDescription
flowstringYesName of the parent flow that the input component belongs to.
namestringYesThe input component name.
aliasstring
NoThe alias to use for the input component.
partition_specAny of:
  string ("full_reduction", "map")
  RepartitionSpec
NoThe type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed.
wherestring
NoAn optional filter condition to apply to the input component's data.
partition_bindingAny of:
  string
  PartitionBinding
NoAn optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions.

MergeStrategy

A strategy that involves merging new data with existing data by updating existing records that match the unique key.

PropertyDefaultTypeRequiredDescription
mergeKeyOptions
NoOptions for merge strategy.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

PropertyDefaultTypeRequiredDescription
unique_keystringYesColumn or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_columnstring
NoColumn name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columnsAny of:
  string
  array[string]
NoList of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_exclude_columns.
merge_exclude_columnsAny of:
  string
  array[string]
NoList of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in merge_update_columns.
incremental_predicatesAny of:
  string
  array[string]
NoList of conditions to filter incremental data.

PartitionBinding

PropertyDefaultTypeRequiredDescription
logical_operatorlogical_operatorstring ("AND", "OR")NoThe logical operator to use to combine the partition binding predicates provided
predicatespredicatesarray[string]NoThe list of partition binding predicates to apply to the input component's data

RepartitionSpec

Specification for repartitioning operations on input component's data

PropertyDefaultTypeRequiredDescription
repartitionRepartitionOptions
NoOptions for repartitioning the input component's data.

RepartitionOptions

Options for repartitioning the input component's data.

PropertyDefaultTypeRequiredDescription
partition_bystringYesThe column to partition by.
granularitystringYesThe granularity to use for the partitioning.