S3 Write Component

Component for writing files to an S3 bucket.

Examples

s3writecomponent_specific_path.yaml
s3writecomponent_uppercase_partitioned.yaml
chunked_s3.yaml

component:
  write:
    connection: myS3Connection
    input:
      flow: my_flow
      name: my_input_component
    s3:
      path: my/specific/path/

component:
  write:
    connection: my-s3-connection
    input:
      flow: my_flow
      name: my_input_component
    uppercase: true
    strategy:
      partitioned:
        mode: sync
        partition_col: my_partition_column
    s3:
      path: my/path/to/data

component:
  write:
    connection: write_s3
    input:
      name: my_component
      flow: my_flow
    strategy: snapshot
    s3:
      path: /some_other_dir/
      formatter: parquet

S3WriteComponent

info

S3WriteComponent is defined beneath the following ancestor nodes in the YAML structure:

Component
WriteComponent

Below are the properties for the S3WriteComponent. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
dependencies		array[None]	No	List of dependencies that must complete before this component runs.
connection		string	Yes	The name of the connection to use for writing data.
input			Yes	Input component name.
normalize		boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention when writing.
preserve_case		boolean	No	A boolean flag indicating if the case of the column names should be preserved when writing.
uppercase		boolean	No	A boolean flag indicating if the column names should be transformed to uppercase when writing.
strategy	full: mode: drop_and_recreate	Any of: snapshot	No	Options to use when writing data to file-based components. When using the snapshot strategy without a name, the flow run id is used by default as the snapshot name.
read_record_chunk_size	100000	integer	No	Number of rows to read from the source. If not set, defaults to 100,000 rows.
target_file_size	104857600	integer	No	Target size in bytes of the file to write. If not set, defaults to 100 * (2**20) bytes (100MB).
target_records_per_file		integer	No	Max number of rows to write to each part file. If not set, will only use the target file size to determine the number of rows to write to each part file. This setting only applies when writing files in partitions. For the snapshot write strategy, it is only used if the path ends with a '/'. For the partitioned write strategy, this setting is always applied.
s3			Yes

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

Property	Default	Type	Required	Description
component		One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent	Yes	Configuration options for the component.

WriteComponent

Property	Type	Required	Description
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
description	string	No	A brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	The name of the flow that the component belongs to.
write	One of: BigQueryWriteComponent SnowflakeWriteComponent S3WriteComponent SFTPWriteComponent GcsWriteComponent AbfsWriteComponent MySQLWriteComponent OracleWriteComponent PostgresWriteComponent	Yes

FileWriteOptionsBase

Resource for formatting files and writing files to a specified path

Property	Default	Type	Required	Description
path		string	Yes	Path to the directory where files will be written. The path must be relative to the connection's root directory; absolute paths or paths that traverse outside the root are not allowed. For the snapshot write strategy: - If the path ends with the expected file format, a single file will be written. - If the path ends with a '/', part files will be written to the specified directory. - Any other case will raise an exception. For the partitioned write strategy, the path always serves as a prefix for file paths, and each partition will be written as a part file.
partition_template		string	No	A template for partition names that contains variables in curly braces. Every file for a partition will be written to a subdirectory with a name derived from the template interpolated with the partition values.
formatter	auto	One of: Any of: auto Any of: parquet Any of: csv Any of: json	No	Formatter Resource for writing files.
manifest			No	Options for writing a manifest file. If not set, no manifest file will be written.

AutoFormatter

Property	Default	Type	Required	Description
auto			Yes	Options for formatting auto files.

CsvFormatter

Property	Default	Type	Required	Description
csv			Yes	Options for formatting CSV files.

JsonFormatter

Property	Default	Type	Required	Description
json			No	Options for formatting JSON files.

ManifestOptions

Options for writing a manifest file.

Property	Default	Type	Required	Description
name		string	Yes	Name of the manifest file.

NoFormatterOptions

No custom formatting options exist for this parser.

No properties defined.

ParquetFormatter

Property	Default	Type	Required	Description
parquet			Yes	Options for formatting Parquet files.

ParquetOptions

Property	Default	Type	Required	Description
compression	zstd	string ("snappy", "gzip", "lzo", "brotli", "lz4", "zstd", "none")	No	Compression algorithm to use for the parquet file using pyarrow. If not set, defaults to zstd.

PartitionedWriteStrategy

Container for specifying the partitioned write strategy.

Property	Default	Type	Required	Description
partitioned			Yes	Options to use when writing data in partitions to a Write component.

FullWriteStrategy

Container for specifying the full write strategy used in write components.

Property	Default	Type	Required	Description
full			Yes	Options to use when fully writing data to a Write component.

FullWriteStrategyOptions

Resource options for full writes, including mode selection.

Property	Default	Type	Required	Description
mode			Yes	Specifies the mode to use when fully writing data: 'drop_and_recreate' to drop the output table and recreate it.

FullWriteModeEnum

No properties defined.

InputComponent

Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.

Property	Type	Required	Description
flow	string	Yes	Name of the parent flow that the input component belongs to.
name	string	Yes	The input component name.
alias	string	No	The alias to use for the input component.
partition_spec	Any of: string ("full_reduction", "map")	No	The type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed.
where	string	No	An optional filter condition to apply to the input component's data.
partition_binding	Any of: string	No	An optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions.

PartitionBinding

Property	Default	Type	Required	Description
logical_operator	logical_operator	string ("AND", "OR")	No	The logical operator to use to combine the partition binding predicates provided
predicates	predicates	array[string]	No	The list of partition binding predicates to apply to the input component's data

PartitionedWriteStrategyOptions

Resource options for incremental writes, including mode selection and criteria for detecting deletions and unique records.

Property	Default	Type	Required	Description
mode			Yes	Specifies the mode to use when writing data in partitions: 'append' to append new or modified partitions, 'insert_overwrite' to insert new partitions and replace/overwrite modified partitions, and 'sync' to encompass both 'insert_overwrite' functionality and to delete partitions when deleted at the source.
partition_col		string	No	Column name used for partitioning, uses the internal Ascend partition identifier by default.

PartitionedWriteModeEnum

No properties defined.

RepartitionSpec

Specification for repartitioning operations on input component's data

Property	Default	Type	Required	Description
repartition			No	Options for repartitioning the input component's data.

RepartitionOptions

Options for repartitioning the input component's data.

Property	Default	Type	Required	Description
partition_by		string	Yes	The column to partition by.
granularity		string	Yes	The granularity to use for the partitioning.

Examples​

S3WriteComponent​

Property Details​

Component​

WriteComponent​

FileWriteOptionsBase​

AutoFormatter​

CsvFormatter​

JsonFormatter​

ManifestOptions​

NoFormatterOptions​

ParquetFormatter​

ParquetOptions​

PartitionedWriteStrategy​

FullWriteStrategy​

FullWriteStrategyOptions​

FullWriteModeEnum​

InputComponent​

PartitionBinding​

PartitionedWriteStrategyOptions​

PartitionedWriteModeEnum​

RepartitionSpec​

RepartitionOptions​

Examples

S3WriteComponent

Property Details

Component

WriteComponent

FileWriteOptionsBase

AutoFormatter

CsvFormatter

JsonFormatter

ManifestOptions

NoFormatterOptions

ParquetFormatter

ParquetOptions

PartitionedWriteStrategy

FullWriteStrategy

FullWriteStrategyOptions

FullWriteModeEnum

InputComponent

PartitionBinding

PartitionedWriteStrategyOptions

PartitionedWriteModeEnum

RepartitionSpec

RepartitionOptions