Skip to main content
Version: 3.0.0

Write output formats

When writing data to blob storage systems, Ascend provides enhanced write strategies that optimize for different use cases and data sizes.

Ascend supports three primary write strategies, each optimized for different use cases:

Full write strategy

The full write strategy replaces all data in the target location during each flow run, ensuring a complete refresh. This strategy treats the data as a single logical unit rather than operating with awareness of chunks or partitions

Behavior: Always produces multiple chunk files with the naming pattern part_<chunkid>.<extension>

Example output: part_001.parquet, part_002.parquet, part_003.parquet, etc.

Snapshot write strategy

The snapshot strategy provides flexible output options based on your path specification. This is the only strategy that supports both single file and chunked output modes.

Single file output

When the path ends with a specific filename and extension (e.g., /path/file.parquet):

  • Ascend validates that the file extension matches the formatter's expected extension
  • All data is written to exactly one file
  • Ideal for smaller datasets or when downstream systems expect a single file

Path validation: The system ensures the extension matches the formatter (e.g., .parquet for parquet formatter)

Chunked output

When the path ends with a trailing slash (e.g., /path/):

  • Data is automatically split into multiple chunk files
  • Files are named with the pattern part_<chunkid>.<extension>
  • Default chunk size is 500,000 rows per file

Example configurations:

# Single file output
path: /data/my_snapshot.parquet

# Chunked output
path: /data/my_snapshot/

Partitioned write strategy

The partitioned write strategy updates only modified partitions while preserving existing partitions. This strategy always produces chunked output for optimal performance.

Behavior: Each partition directory contains multiple chunk files named data_<chunkid>.<extension>

Example output structure:

/data/
date=2024-01-01/
data_001.parquet
data_002.parquet
date=2024-01-02/
data_001.parquet
data_002.parquet

File naming conventions

Single file output

<your_specified_path_and_filename>

Example: /data/output/my_snapshot.parquet

Chunked output (snapshot strategy)

<path_prefix>/part_<chunkid>.<extension>

Examples:

  • /data/output/part_001.parquet
  • /data/output/part_002.parquet

Chunked output (partitioned strategy)

<partition_path>/data_<chunkid>.<extension>

Examples:

  • /data/date=2024-01-01/data_001.parquet
  • /data/date=2024-01-01/data_002.parquet

Naming conventions:

  • <chunkid> is a zero-padded sequential identifier (e.g., 001, 002, 003)
  • <extension> matches your specified formatter (parquet, csv, json)
  • Chunk size defaults to 500,000 rows but can be customized via part_file_rows

Next steps

Ready to implement these write strategies? Check out the specific guides for your cloud storage platform: