Partitioned Strategy

Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.

PartitionedStrategy

info

PartitionedStrategy is defined beneath the following ancestor nodes in the YAML structure:

Component
CustomPythonReadComponent
CustomPythonReadOptions
ReadComponent
AbfsReadComponent
BigQueryReadComponent
DatabricksReadComponent
GcsReadComponent
GenericFileReadComponent
LocalFileReadComponent
MSSQLReadComponent
MySQLReadComponent
OracleReadComponent
PostgresReadComponent
S3ReadComponent
SFTPReadComponent
SnowflakeReadComponent
TransformComponent
PySparkTransform
PythonTransform
SnowparkTransform
SqlTransform

Below are the properties for the PartitionedStrategy. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
partitioned			No	Options for partitioning data.
on_schema_change		string ("ignore", "fail", "append_new_columns", "sync_all_columns")	No	Policy to apply when schema changes are detected. Defaults to 'fail' if not provided.

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

Property	Default	Type	Required	Description
component		One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent	Yes	Configuration options for the component.

CustomPythonReadComponent

A component that reads data using user-defined custom Python code.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for a component.
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
description	string	No	A brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	The name of the flow that the component belongs to.
data_maintenance		No	The data maintenance configuration options for the component.
tests		No	Defines tests to run on the data of this component.
custom_python_read		Yes

CustomPythonReadOptions

Configuration options for the Custom Python Read component.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
strategy	full	Any of: full IncrementalStrategy PartitionedStrategy	No	Ingest strategy.
python		Any of:	Yes	Python code to execute for ingesting data.

ReadComponent

A component that reads data from a data system.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for a component.
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
description	string	No	A brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	The name of the flow that the component belongs to.
data_maintenance		No	The data maintenance configuration options for the component.
tests		No	Defines tests to run on the data of this component.
read	One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent	Yes	The read component that reads data from a data system.

AbfsReadComponent

Component for reading files from an Azure Blob Storage container.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	PartitionedStrategy	No	Ingest strategy when reading files.
abfs		Yes	Options for reading files from an Azure Blob Storage container.

BigQueryReadComponent

A component that reads data from a BigQuery table.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
bigquery	Any of:	Yes

DatabricksReadComponent

A component that reads data from a Databricks table.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
databricks	Any of:	Yes

GcsReadComponent

Component for reading files from a Google Cloud Storage bucket.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	PartitionedStrategy	No	Ingest strategy when reading files.
gcs		Yes	Options for reading files from a Google Cloud Storage bucket.

GenericFileReadComponent

Component for reading files from a filesystem.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	PartitionedStrategy	No	Ingest strategy when reading files.
generic_file		Yes	Options for reading files from a filesystem.

LocalFileReadComponent

Component for reading files from the local filesystem.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	PartitionedStrategy	No	Ingest strategy when reading files.
local_file		Yes	Options for reading files from the local filesystem.

MSSQLReadComponent

A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
mssql	Any of:	Yes

MySQLReadComponent

A component that reads data from a MySQL database, options include ingesting a single table / query, or multiple tables / queries.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
use_duckdb	boolean	No	Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
mysql	Any of:	Yes	MySQL read options.
use_checksum	boolean	No	Use table checksum to detect data changes. if false or unset, will do full re-read for every run for full-sync.

OracleReadComponent

A component that reads data from an Oracle table.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
connection		string	No	The name of the connection to use for reading data.
columns		array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize		boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case		boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase		boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy		Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options			No	Options for reading from the database or warehouse.
oracle	Oracle	Any of:	No	Oracle read options.

PostgresReadComponent

A component that reads data from a Postgresql table.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
connection		string	No	The name of the connection to use for reading data.
columns		array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize		boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case		boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase		boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy		Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options			No	Options for reading from the database or warehouse.
use_duckdb		boolean	No	Use DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
postgres	Postgres	Any of:	No	Postgres read options.

S3ReadComponent

Component for reading files from an S3 bucket.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	PartitionedStrategy	No	Ingest strategy when reading files.
s3		Yes	Options for reading files from an S3 bucket.

SFTPReadComponent

Component for reading files from an SFTP server.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	PartitionedStrategy	No	Ingest strategy when reading files.
sftp		Yes	Options for reading files from an SFTP server.

SnowflakeReadComponent

A component that reads data from a Snowflake table.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
snowflake	Any of:	Yes

TransformComponent

A component that executes SQL or Python code to transform data.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for a component.
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
description	string	No	A brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	The name of the flow that the component belongs to.
data_maintenance		No	The data maintenance configuration options for the component.
tests		No	Defines tests to run on the data of this component.
transform	One of: SqlTransform PythonTransform SnowparkTransform PySparkTransform	Yes	The transform component that executes SQL or Python code to transform data.

PySparkTransform

PySpark transforms execute PySpark code to transform data.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	The size/time granularity of the microbatch to process.
lookback	1	integer	No	The number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	The 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as data sources for the transform.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy - incremental, partitioned, or view/table.
pyspark			No	PySpark transform function to execute for transforming the data.

PythonTransform

Python transforms execute Python code to transform data.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	The size/time granularity of the microbatch to process.
lookback	1	integer	No	The number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	The 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as data sources for the transform.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy - incremental, partitioned, or view/table.
python			No	Python transform function to execute for transforming the data.

SnowparkTransform

Snowpark transforms execute Python code to transform data within the Snowflake platform.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	The size/time granularity of the microbatch to process.
lookback	1	integer	No	The number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	The 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as data sources for the transform.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy - incremental, partitioned, or view/table.
snowpark			No	Snowpark transform function to execute for transforming the data.

SqlTransform

SQL transforms execute SQL queries to transform data.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
event_time		string	No	Timestamp column in the component output used to represent event time.
microbatch		boolean	No	Whether to process data in microbatches.
batch_size		string	No	The size/time granularity of the microbatch to process.
lookback	1	integer	No	The number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
begin		string	No	The 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputs		array[None]	No	List of input components to use as data sources for the transform.
strategy		Any of: PartitionedStrategy IncrementalStrategy string ("view", "table")	No	Transform strategy - incremental, partitioned, or view/table.
sql		string	No	SQL query to execute for transforming the data.
dialect		spark	No	SQL dialect to use for the query. Set to 'None' for the data plane's default dialect, or 'spark' for Spark SQL.

PartitionedOptions

Options related to partition optimization - in particular, the policy that determines which partitions to ingest.

Property	Default	Type	Required	Description
enable_substitution_by_partition_name		boolean	Yes	Enable substitution by partition name.
output_type	table	string ("table", "view")	No	Output type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms.