Skip to main content
Version: 3.0.0

Writing to a Blob Store

Prerequisites​

  • A Connection to the file system you wish to write to
  • An upstream Component with data to write

Create the file​

Navigate to your Workspace from the Homepage

Within the file explorer, open up the flow with the Component you wish to write out

Create a new file and give it a name with a .yaml extension, e.g. blob_write.yaml

my_project/
├── ascend_project.yaml
├── connections/
│ └── s3_wc.yaml
├── flows/
│ ├── bar_flow/
│ │ ├── bar_flow.yaml
│ │ └── components/
│ │ └── bar_component.yaml
│ └── foo_flow/
│ ├── foo_flow.yaml
│ └── components/
│ ├── foo_component.yaml
│ └── (+ New File) blob_write.yaml
├── profiles/
└── vaults/

Write Options​

  • connection The name of the Ascend Connection that will be used to write to the blob storage

  • input Specifies the input Component that will be written to the database

  • <write_connector_target> The name of the option itself defines the type of Write Connector that will be created. Different Write Connectors will require different options to specify the file outputs. For a full list of file write connector types, see Write Components. However, common to all file Write Connectors you will need to specify:

    • a directory path where files will be written into
    • a formatter, currently only parquet is supported
    • partition_template can be optionally specified to define a template for naming the partitions.
  • strategy Can be one of two options:

    • full performs a full refresh of the target table, replacing all of the records each Flow Run. Currently only drop_and_recreate mode is supported for full writes.

    • partitioned writes only the partitions on the target table where records have been updated in the input Component. Must specify one of append, insert_overwrite, or sync for the mode to determine how partitions are written to the target table. For a full explanation of the partitioned write strategies see Partitioned Write Strategies

Examples​

my_project/flows/foo_flow/components/blob_write.yaml
component:
write:
connection: s3_wc
input:
name: foo_component
flow: foo_flow
strategy:
full:
mode: drop_and_recreate
s3:
path: /some_other_dir
formatter: parquet

In this example, we are using an s3 connection to write out the contents of foo_component to a s3 bucket, with a full write strategy, using the drop_and_recreate mode to replace all Partitions each Flow Run, regardless of whether any records have been updated.

my_project/flows/bar_flow/components/blob_write.yaml
component:
write:
connection: s3_wc
input:
name: foo_component
flow: foo_flow
strategy:
partitioned:
mode: append
s3:
path: /some_parquet_dir
formatter: parquet

In this example, we are using a s3 connection to write out the contents of foo_component to a s3 bucket, with a partitioned write strategy, using the append mode to add new or updated partitions to the target table. Old partitions are preserved.