Partitioned Write Strategy
Container for specifying the partitioned write strategy.
PartitionedWriteStrategy
info
PartitionedWriteStrategy
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the PartitionedWriteStrategy
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | Yes | Options to use when writing partitioned data to a Write Component. |
Property Details
Component
A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent FivetranComponent | Yes | Component configuration options. |
WriteComponent
Property | Default | Type | Required | Description |
---|---|---|---|---|
skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
description | string | No | Brief description of what the model does. | |
metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
name | string | Yes | The name of the model | |
flow_name | string | No | Name of the Flow that the Component belongs to. | |
write | One of: BigQueryWriteComponent SnowflakeWriteComponent S3WriteComponent SFTPWriteComponent GcsWriteComponent AbfsWriteComponent MySQLWriteComponent OracleWriteComponent PostgresWriteComponent | Yes |
AbfsWriteComponent
Component for writing files to an ABFS container.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
connection | string | Yes | Name of the Connection to use for writing data. | |
input | Yes | Input component name. | ||
normalize | boolean | No | Boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | Boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | Boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | full: mode: drop_and_recreate | Any of: snapshot FullWriteStrategy PartitionedWriteStrategy | No | Options to use when writing data to file-based Components. When using the snapshot strategy without a name, the Flow run id is used by default as the snapshot name. |
read_record_chunk_size | 100000 | integer | No | Number of rows to read from the source. If not set, defaults to 100,000 rows. |
target_file_size | 104857600 | integer | No | Target size in bytes of the file to write. If not set, defaults to 100 * (2**20) bytes (100MB). |
target_records_per_file | integer | No | Max number of rows to write to each part file. If not set, will only use the target file size to determine the number of rows to write to each part file. This setting only applies when writing files in partitions. For the snapshot write strategy, it is only used if the path ends with a '/'. For the partitioned write strategy, this setting is always applied. | |
abfs | Yes |
GcsWriteComponent
Component for writing files to a GCS bucket.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
connection | string | Yes | Name of the Connection to use for writing data. | |
input | Yes | Input component name. | ||
normalize | boolean | No | Boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | Boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | Boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | full: mode: drop_and_recreate | Any of: snapshot FullWriteStrategy PartitionedWriteStrategy | No | Options to use when writing data to file-based Components. When using the snapshot strategy without a name, the Flow run id is used by default as the snapshot name. |
read_record_chunk_size | 100000 | integer | No | Number of rows to read from the source. If not set, defaults to 100,000 rows. |
target_file_size | 104857600 | integer | No | Target size in bytes of the file to write. If not set, defaults to 100 * (2**20) bytes (100MB). |
target_records_per_file | integer | No | Max number of rows to write to each part file. If not set, will only use the target file size to determine the number of rows to write to each part file. This setting only applies when writing files in partitions. For the snapshot write strategy, it is only used if the path ends with a '/'. For the partitioned write strategy, this setting is always applied. | |
gcs | Yes |
S3WriteComponent
Component for writing files to an S3 bucket.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
connection | string | Yes | Name of the Connection to use for writing data. | |
input | Yes | Input component name. | ||
normalize | boolean | No | Boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | Boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | Boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | full: mode: drop_and_recreate | Any of: snapshot FullWriteStrategy PartitionedWriteStrategy | No | Options to use when writing data to file-based Components. When using the snapshot strategy without a name, the Flow run id is used by default as the snapshot name. |
read_record_chunk_size | 100000 | integer | No | Number of rows to read from the source. If not set, defaults to 100,000 rows. |
target_file_size | 104857600 | integer | No | Target size in bytes of the file to write. If not set, defaults to 100 * (2**20) bytes (100MB). |
target_records_per_file | integer | No | Max number of rows to write to each part file. If not set, will only use the target file size to determine the number of rows to write to each part file. This setting only applies when writing files in partitions. For the snapshot write strategy, it is only used if the path ends with a '/'. For the partitioned write strategy, this setting is always applied. | |
s3 | Yes |
SFTPWriteComponent
Component for writing files to an SFTP server.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
connection | string | Yes | Name of the Connection to use for writing data. | |
input | Yes | Input component name. | ||
normalize | boolean | No | Boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | Boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | Boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | full: mode: drop_and_recreate | Any of: snapshot FullWriteStrategy PartitionedWriteStrategy | No | Options to use when writing data to file-based Components. When using the snapshot strategy without a name, the Flow run id is used by default as the snapshot name. |
read_record_chunk_size | 100000 | integer | No | Number of rows to read from the source. If not set, defaults to 100,000 rows. |
target_file_size | 104857600 | integer | No | Target size in bytes of the file to write. If not set, defaults to 100 * (2**20) bytes (100MB). |
target_records_per_file | integer | No | Max number of rows to write to each part file. If not set, will only use the target file size to determine the number of rows to write to each part file. This setting only applies when writing files in partitions. For the snapshot write strategy, it is only used if the path ends with a '/'. For the partitioned write strategy, this setting is always applied. | |
sftp | Yes |
PartitionedWriteStrategyOptions
Resource options for incremental writes, including mode selection and criteria for detecting deletions and unique records.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | Yes | Specifies the mode to use when writing data in partitions: 'append' to append new or modified partitions, 'insert_overwrite' to insert new partitions and replace/overwrite modified partitions, and 'sync' to encompass both 'insert_overwrite' functionality and to delete partitions when deleted at the source. | ||
partition_col | string | No | Column name used for partitioning. Uses the internal Ascend partition identifier by default. |
PartitionedWriteModeEnum
No properties defined.