S3 Write Component
Component for writing files to an S3 bucket.
Examples
- s3writecomponent_specific_path.yaml
- s3writecomponent_uppercase_partitioned.yaml
component:
write:
connection: myS3Connection
input:
flow: my_flow
name: my_input_component
s3:
path: my/specific/path/
component:
write:
connection: my-s3-connection
input:
flow: my_flow
name: my_input_component
uppercase: true
strategy:
partitioned:
mode: sync
partition_col: my_partition_column
s3:
path: my/path/to/data
S3WriteComponent
S3WriteComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the S3WriteComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
connection | string | Yes | The name of the connection to use for writing data. | |
input | InputComponent | Yes | Input component name. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | partitioned: mode: sync partition_col: null | Any of: PartitionedWriteStrategy FullWriteStrategy | No | Options to use when writing data to file-based components. |
s3 | FileWriteOptionsBase | Yes |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
WriteComponent
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
write | One of: BigQueryWriteComponent SnowflakeWriteComponent S3WriteComponent MySQLWriteComponent OracleWriteComponent | Yes |
FileWriteOptionsBase
Resource for formatting files and writing files to a specified path
Property | Default | Type | Required | Description |
---|---|---|---|---|
path | string | Yes | Path to the directory to write to. Path is relative to the connection's root directory, and cannot be an absolute path or traverse outside the root directory. | |
partition_template | string | No | A template for partition names that contains variables in curly braces. Every file for a partition will be written to a subdirectory with a name derived from the template interpolated with the partition values. | |
formatter | auto | One of: Any of: string ("auto") AutoFormatter Any of: string ("parquet") ParquetFormatter Any of: string ("csv") CsvFormatter Any of: string ("json") JsonFormatter | No | Formatter Resource for writing files. |
manifest | ManifestOptions | No | Options for writing a manifest file. If not set, no manifest file will be written. |
AutoFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
auto | NoFormatterOptions | Yes |
CsvFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
csv | NoFormatterOptions | Yes |
JsonFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
json | NoFormatterOptions | No |
ManifestOptions
Options for writing a manifest file.
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | Yes | Name of the manifest file. |
ParquetFormatter
Property | Default | Type | Required | Description |
---|---|---|---|---|
parquet | NoFormatterOptions | Yes |
NoFormatterOptions
No custom formatting options exist for this parser.
No properties defined.
PartitionedWriteStrategy
Container for specifying the partitioned write strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | PartitionedWriteStrategyOptions | Yes | Options to use when writing data in partitions to a Write component. |
FullWriteStrategy
Container for specifying the incremental write strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
full | FullWriteStrategyOptions | Yes | Options to use when fully writing data to a Write component. |
FullWriteStrategyOptions
Resource options for full writes, including mode selection.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | FullWriteModeEnum | Yes | Specifies the mode to use when fully writing data: 'drop_and_recreate' to drop the output table and recreate it. |
FullWriteModeEnum
No properties defined.
InputComponent
Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.
Property | Default | Type | Required | Description |
---|---|---|---|---|
flow | string | Yes | Name of the parent flow that the input component belongs to. | |
name | string | Yes | The input component name. | |
alias | string | No | The alias to use for the input component. | |
partition_spec | Any of: string ("full_reduction", "map") RepartitionSpec | No | The type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed. | |
where | string | No | An optional filter condition to apply to the input component's data. | |
partition_binding | Any of: string PartitionBinding | No | An optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions. |
PartitionBinding
Property | Default | Type | Required | Description |
---|---|---|---|---|
logical_operator | logical_operator | string ("AND", "OR") | No | The logical operator to use to combine the partition binding predicates provided |
predicates | predicates | array[string] | No | The list of partition binding predicates to apply to the input component's data |
PartitionedWriteStrategyOptions
Resource options for incremental writes, including mode selection and criteria for detecting deletions and unique records.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | PartitionedWriteModeEnum | Yes | Specifies the mode to use when writing data in partitions: 'append' to append new or modified partitions, 'insert_overwrite' to insert new partitions and replace/overwrite modified partitions, and 'sync' to encompass both 'insert_overwrite' functionality and to delete partitions when deleted at the source. | |
partition_col | string | No | Column name used for partitioning, uses the internal Ascend partition identifier by default. |
PartitionedWriteModeEnum
No properties defined.
RepartitionSpec
Specification for repartitioning operations on input component's data
Property | Default | Type | Required | Description |
---|---|---|---|---|
repartition | RepartitionOptions | No | Options for repartitioning the input component's data. |
RepartitionOptions
Options for repartitioning the input component's data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | string | Yes | The column to partition by. | |
granularity | string | Yes | The granularity to use for the partitioning. |