Skip to main content

Partitioned Strategy

Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.

PartitionedStrategy

Below are the properties for the PartitionedStrategy. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
partitionedNoOptions for partitioning data.
on_schema_changestring ("ignore", "fail", "append_new_columns", "sync_all_columns")
NoPolicy to apply when schema changes are detected. Defaults to 'fail' if not provided.

Property Details

Component

A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.

PropertyDefaultTypeRequiredDescription
componentOne of:
  CustomPythonReadComponent
  ApplicationComponent
  AliasedTableComponent
  ExternalTableComponent
YesComponent configuration options.

CustomPythonReadComponent

Component that reads data using user-defined custom Python code.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoBoolean flag indicating whether to skip processing for the Component or not.
retry_strategyNoRetry strategy configuration options for the Component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoName of the Flow that the Component belongs to.
data_maintenanceNoThe data maintenance configuration options for the Component.
testsNoDefines tests to run on this Component's data.
custom_python_readYes

CustomPythonReadOptions

Configuration options for the Custom Python Read component.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
strategyfullAny of:
  full
  IncrementalStrategy
  PartitionedStrategy
NoIngest strategy.
pythonAny of:
YesPython code to execute for ingesting data.

ReadComponent

Component that reads data from a system.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoBoolean flag indicating whether to skip processing for the Component or not.
retry_strategyNoRetry strategy configuration options for the Component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoName of the Flow that the Component belongs to.
data_maintenanceNoThe data maintenance configuration options for the Component.
testsNoDefines tests to run on this Component's data.
readOne of:
  GenericFileReadComponent
  LocalFileReadComponent
  SFTPReadComponent
  S3ReadComponent
  GcsReadComponent
  AbfsReadComponent
  HttpReadComponent
  MSSQLReadComponent
  MySQLReadComponent
  OracleReadComponent
  PostgresReadComponent
  SnowflakeReadComponent
  BigQueryReadComponent
  DatabricksReadComponent
YesRead component that reads data from a system.

AbfsReadComponent

Component for reading files from an Azure Blob Storage container.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyPartitionedStrategy
NoIngest strategy when reading files.
abfsYesOptions for reading files from an Azure Blob Storage container.

BigQueryReadComponent

Component that reads data from a BigQuery table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
bigqueryAny of:
Yes

DatabricksReadComponent

Component that reads data from a Databricks table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
databricksAny of:
Yes

GcsReadComponent

Component for reading files from a Google Cloud Storage bucket.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyPartitionedStrategy
NoIngest strategy when reading files.
gcsYesOptions for reading files from a Google Cloud Storage bucket.

GenericFileReadComponent

Component for reading files from a filesystem.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyPartitionedStrategy
NoIngest strategy when reading files.
generic_fileYesOptions for reading files from a filesystem.

LocalFileReadComponent

Component for reading files from the local filesystem.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyPartitionedStrategy
NoIngest strategy when reading files.
local_fileYesOptions for reading files from the local filesystem.

MSSQLReadComponent

A component that reads data from a MSSQL Server database, options include ingesting a single table / query, or multiple tables / queries.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
mssqlAny of:
Yes

MySQLReadComponent

Component that reads data from a MySQL database, options include ingesting a single table / query, or multiple tables / queries.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
use_duckdbboolean
NoUse DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
mysqlAny of:
YesMySQL read options.
use_checksumboolean
NoUse table checksum to detect data changes. If false or unset, will do full re-read for every run for full-sync.

OracleReadComponent

Component that reads data from an Oracle table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
oracleOracleAny of:
NoOracle read options.

PostgresReadComponent

Component that reads data from a Postgresql table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
use_duckdbboolean
NoUse DuckDB extension for reading data, which is faster but may have memory limitations with very large tables. Defaults to False
postgresPostgresAny of:
NoPostgres read options.

S3ReadComponent

Component for reading files from an S3 bucket.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyPartitionedStrategy
NoIngest strategy when reading files.
s3YesOptions for reading files from an S3 bucket.

SFTPReadComponent

Component for reading files from an SFTP server.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyPartitionedStrategy
NoIngest strategy when reading files.
sftpYesOptions for reading files from an SFTP server.

SnowflakeReadComponent

Component that reads data from a Snowflake table.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
connectionstring
NoName of the Connection to use for reading data.
columnsarray[None]
NoList specifying the columns to read from the source and transformations to make during read.
normalizeboolean
NoBoolean flag indicating whether the output column names should be normalized to a standard naming convention after reading.
preserve_caseboolean
NoBoolean flag indicating whether the case of the column names should be preserved after reading.
uppercaseboolean
NoBoolean flag indicating whether the column names should be transformed to uppercase after reading.
strategyAny of:
  full
  IncrementalReadStrategy
  PartitionedStrategy
NoIngest strategy options.
read_optionsNoOptions for reading from the database or warehouse.
snowflakeAny of:
Yes

TransformComponent

Component that executes SQL or Python code to transform data.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipboolean
NoBoolean flag indicating whether to skip processing for the Component or not.
retry_strategyNoRetry strategy configuration options for the Component if any exceptions are encountered.
descriptionstring
NoA brief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
flow_namestring
NoName of the Flow that the Component belongs to.
data_maintenanceNoThe data maintenance configuration options for the Component.
testsNoDefines tests to run on this Component's data.
transformOne of:
  SqlTransform
  PythonTransform
  SnowparkTransform
  PySparkTransform
YesThe Transform Component that executes SQL or Python code to transform data.

PySparkTransform

PySpark transforms execute PySpark code to transform data.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as Transform data sources.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy: either incremental, partitioned, or view/table.
pysparkNoPySpark transform function to execute for transforming the data.

PythonTransform

Python transforms execute Python code to transform data.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as Transform data sources.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy: either incremental, partitioned, or view/table.
pythonNoPython transform function to execute for transforming the data.

SnowparkTransform

Snowpark transforms execute Python code to transform data within the Snowflake platform.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as Transform data sources.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy: either incremental, partitioned, or view/table.
snowparkNoSnowpark transform function to execute for transforming the data.

SqlTransform

SQL transforms execute SQL queries to transform data.

PropertyDefaultTypeRequiredDescription
dependenciesarray[None]
NoList of dependencies that must complete before this Component runs.
event_timestring
NoTimestamp column in the component output used to represent event time.
microbatchboolean
NoWhether to process data in microbatches.
batch_sizestring
NoThe size/time granularity of the microbatch to process.
lookback1integerNoThe number of time intervals prior to the current interval (and inclusive of current interval) to process in time-series processing mode.
beginstring
NoThe 'beginning of time' for this component. If provided, time intervals before this time will be skipped in a time-series run.
inputsarray[None]
NoList of input components to use as Transform data sources.
strategyAny of:
  PartitionedStrategy
  IncrementalStrategy
  string ("view", "table")
NoTransform strategy: either incremental, partitioned, or view/table.
sqlstring
NoSQL query to execute for data.
dialectspark
NoSQL dialect to use for the query. Set to 'None' for the data plane's default dialect, or 'spark' for Spark SQL.

PartitionedOptions

Options related to partition optimization - in particular, the policy that determines which partitions to ingest.

PropertyDefaultTypeRequiredDescription
enable_substitution_by_partition_namebooleanYesEnable substitution by partition name.
output_typetablestring ("table", "view")NoOutput type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms.