S3 Write Component
Component for writing files to an S3 bucket.
Examples
- s3writecomponent_specific_path.yaml
- s3writecomponent_uppercase_partitioned.yaml
component:
write:
connection: myS3Connection
input:
flow: my_flow
name: my_input_component
s3:
path: my/specific/path/
component:
write:
connection: my-s3-connection
input:
flow: my_flow
name: my_input_component
uppercase: true
strategy:
partitioned:
mode: sync
partition_col: my_partition_column
s3:
path: my/path/to/data
S3WriteComponent
S3WriteComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the S3WriteComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
dependencies | array[None] | No | List of dependencies that must complete before this component runs. | |
connection | string | Yes | The name of the connection to use for writing data. | |
input | Yes | Input component name. | ||
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | partitioned: mode: sync partition_col: null | Any of: snapshot | No | Options to use when writing data to file-based components. When using the snapshot strategy without a name, the flow run id is used by default as the snapshot name. |
read_record_chunk_size | 100000 | integer | No | Number of rows to read from the source. If not set, defaults to 100,000 rows. |
target_file_size | 104857600 | integer | No | Target size in bytes of the file to write. If not set, defaults to 100 * (2**20) bytes (100MB). |
target_records_per_file | integer | No | Max number of rows to write to each part file. If not set, will only use the target file size to determine the number of rows to write to each part file. This setting only applies when writing files in partitions. For the snapshot write strategy, it is only used if the path ends with a '/'. For the partitioned write strategy, this setting is always applied. | |
s3 | Yes |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
WriteComponent
Property | Default | Type | Required | Description |
---|---|---|---|---|
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
retry_strategy | No | The retry strategy configuration options for the component if any exceptions are encountered. | ||
description | string | No | A brief description of what the model does. | |
metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
name | string | Yes | The name of the model | |
flow_name | string | No | The name of the flow that the component belongs to. | |
write | One of: BigQueryWriteComponent SnowflakeWriteComponent S3WriteComponent SFTPWriteComponent GcsWriteComponent AbfsWriteComponent MySQLWriteComponent OracleWriteComponent | Yes |
FileWriteOptionsBase
Resource for formatting files and writing files to a specified path
Property | Default | Type | Required | Description |
---|---|---|---|---|
path | string | Yes | Path to the directory where files will be written. The path must be relative to the connection's root directory; absolute paths or paths that traverse outside the root are not allowed. For the snapshot write strategy: - If the path ends with the expected file format, a single file will be written. - If the path ends with a '/', part files will be written to the specified directory. - Any other case will raise an exception. For the partitioned write strategy, the path always serves as a prefix for file paths, and each partition will be written as a part file. | |
partition_template | string | No | A template for partition names that contains variables in curly braces. Every file for a partition will be written to a subdirectory with a name derived from the template interpolated with the partition values. | |
formatter | auto | One of: Any of: auto Any of: parquet Any of: csv Any of: json | No | Formatter Resource for writing files. |
manifest | No | Options for writing a manifest file. If not set, no manifest file will be written. |
AutoFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
auto | Yes | Options for formatting auto files. |
CsvFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
csv | Yes | Options for formatting CSV files. |
JsonFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
json | No | Options for formatting JSON files. |
ManifestOptions
Options for writing a manifest file.
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | Yes | Name of the manifest file. |
NoFormatterOptions
No custom formatting options exist for this parser.
No properties defined.
ParquetFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
parquet | Yes | Options for formatting Parquet files. |
ParquetOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
compression | zstd | string ("snappy", "gzip", "lzo", "brotli", "lz4", "zstd", "none") | No | Compression algorithm to use for the parquet file using pyarrow. If not set, defaults to zstd. |
PartitionedWriteStrategy
Container for specifying the partitioned write strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | Yes | Options to use when writing data in partitions to a Write component. |
FullWriteStrategy
Container for specifying the full write strategy used in write components.
Property | Default | Type | Required | Description |
---|---|---|---|---|
full | Yes | Options to use when fully writing data to a Write component. |
FullWriteStrategyOptions
Resource options for full writes, including mode selection.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | Yes | Specifies the mode to use when fully writing data: 'drop_and_recreate' to drop the output table and recreate it. |
FullWriteModeEnum
No properties defined.
InputComponent
Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.
Property | Default | Type | Required | Description |
---|---|---|---|---|
flow | string | Yes | Name of the parent flow that the input component belongs to. | |
name | string | Yes | The input component name. | |
alias | string | No | The alias to use for the input component. | |
partition_spec | Any of: string ("full_reduction", "map") | No | The type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed. | |
where | string | No | An optional filter condition to apply to the input component's data. | |
partition_binding | Any of: string | No | An optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions. |
PartitionBinding
Property | Default | Type | Required | Description |
---|---|---|---|---|
logical_operator | logical_operator | string ("AND", "OR") | No | The logical operator to use to combine the partition binding predicates provided |
predicates | predicates | array[string] | No | The list of partition binding predicates to apply to the input component's data |
PartitionedWriteStrategyOptions
Resource options for incremental writes, including mode selection and criteria for detecting deletions and unique records.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | Yes | Specifies the mode to use when writing data in partitions: 'append' to append new or modified partitions, 'insert_overwrite' to insert new partitions and replace/overwrite modified partitions, and 'sync' to encompass both 'insert_overwrite' functionality and to delete partitions when deleted at the source. | ||
partition_col | string | No | Column name used for partitioning, uses the internal Ascend partition identifier by default. |
PartitionedWriteModeEnum
No properties defined.
RepartitionSpec
Specification for repartitioning operations on input component's data
Property | Default | Type | Required | Description |
---|---|---|---|---|
repartition | No | Options for repartitioning the input component's data. |
RepartitionOptions
Options for repartitioning the input component's data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | string | Yes | The column to partition by. | |
granularity | string | Yes | The granularity to use for the partitioning. |