Skip to main content
Version: 3.0.0

Writing to a Blob Store

Prerequisites

  • A Connection to the file system you wish to write to
  • An upstream Component with data to write

Create the file

Navigate to your Workspace from the Homepage

Within the file explorer, open up the flow with the Component you wish to write out

Create a new file and give it a name with a .yaml extension, e.g. blob_write.yaml

my_project/
├── ascend_project.yaml
├── connections/
│ └── s3_wc.yaml
├── flows/
│ ├── bar_flow/
│ │ ├── bar_flow.yaml
│ │ └── components/
│ │ └── bar_component.yaml
│ └── foo_flow/
│ ├── foo_flow.yaml
│ └── components/
│ ├── foo_component.yaml
│ └── (+ New File) blob_write.yaml
├── profiles/
└── vaults/

Write Options

  • connection The name of the Ascend Connection that will be used to write to the blob storage

  • input Specifies the input Component that will be written to the database

  • <write_connector_target> The name of the option itself defines the type of Write Connector that will be created. Different Write Connectors will require different options to specify the file outputs. For a full list of file write connector types, see Write Components. However, common to all file Write Connectors you will need to specify:

    • a directory path where files will be written into
    • a formatter, currently only parquet is supported
    • partition_template can be optionally specified to define a template for naming the partitions.
  • strategy Can be one of two options:

    • full performs a full refresh of the target table, replacing all of the records each Flow Run. Currently only drop_and_recreate mode is supported for full writes.

    • partitioned writes only the partitions on the target table where records have been updated in the input Component. Must specify one of append, insert_overwrite, or sync for the mode to determine how partitions are written to the target table. For a full explanation of the partitioned write strategies see Partitioned Write Strategies

Examples

my_project/flows/foo_flow/components/blob_write.yaml
component:
write:
connection: s3_wc
input:
name: foo_component
flow: foo_flow
strategy:
full:
mode: drop_and_recreate
s3:
path: /some_other_dir
formatter: parquet

In this example, we are using an s3 connection to write out the contents of foo_component to a s3 bucket, with a full write strategy, using the drop_and_recreate mode to replace all Partitions each Flow Run, regardless of whether any records have been updated.

my_project/flows/bar_flow/components/blob_write.yaml
component:
write:
connection: s3_wc
input:
name: foo_component
flow: foo_flow
strategy:
partitioned:
mode: append
s3:
path: /some_parquet_dir
formatter: parquet

In this example, we are using a s3 connection to write out the contents of foo_component to a s3 bucket, with a partitioned write strategy, using the append mode to add new or updated partitions to the target table. Old partitions are preserved.