Writing to a Blob Store
Prerequisites
- A Connection to the file system you wish to write to
- An upstream Component with data to write
Create the file
- ui
Navigate to your Workspace from the Homepage
Within the file explorer, open up the flow with the Component you wish to write out
Create a new file and give it a name with a .yaml
extension, e.g. blob_write.yaml
my_project/
├── ascend_project.yaml
├── connections/
│ └── s3_wc.yaml
├── flows/
│ ├── bar_flow/
│ │ ├── bar_flow.yaml
│ │ └── components/
│ │ └── bar_component.yaml
│ └── foo_flow/
│ ├── foo_flow.yaml
│ └── components/
│ ├── foo_component.yaml
│ └── (+ New File) blob_write.yaml
├── profiles/
└── vaults/
Write Options
-
connection
The name of the Ascend Connection that will be used to write to the blob storage -
input
Specifies the input Component that will be written to the database -
<write_connector_target>
The name of the option itself defines the type of Write Connector that will be created. Different Write Connectors will require different options to specify the file outputs. For a full list of file write connector types, see Write Components. However, common to all file Write Connectors you will need to specify:- a directory
path
where files will be written into - a
formatter
, currently onlyparquet
is supported partition_template
can be optionally specified to define a template for naming the partitions.
- a directory
-
strategy
Can be one of two options:-
full
performs a full refresh of the target table, replacing all of the records each Flow Run. Currently onlydrop_and_recreate
mode is supported for full writes. -
partitioned
writes only the partitions on the target table where records have been updated in the input Component. Must specify one ofappend
,insert_overwrite
, orsync
for the mode to determine how partitions are written to the target table. For a full explanation of the partitioned write strategies see Partitioned Write Strategies
-
Examples
component:
write:
connection: s3_wc
input:
name: foo_component
flow: foo_flow
strategy:
full:
mode: drop_and_recreate
s3:
path: /some_other_dir
formatter: parquet
In this example, we are using an s3 connection to write out the contents of foo_component
to a s3 bucket, with a full write strategy, using the drop_and_recreate
mode to replace all Partitions each Flow Run, regardless of whether any records have been updated.
component:
write:
connection: s3_wc
input:
name: foo_component
flow: foo_flow
strategy:
partitioned:
mode: append
s3:
path: /some_parquet_dir
formatter: parquet
In this example, we are using a s3 connection to write out the contents of foo_component
to a s3 bucket, with a partitioned write strategy, using the append
mode to add new or updated partitions to the target table. Old partitions are preserved.