Skip to main content

Write strategies

Choose the right write strategy to optimize how Ascend writes data to your blob storage systems. Each strategy is designed for specific use cases, from complete data refreshes to incremental updates and single-file outputs.

Ascend supports three primary write strategies:

Full write strategy (default)

The full write strategy replaces all data in the target location during each Flow run, ensuring a complete refresh. This strategy treats the data as a single logical unit rather than operating with awareness of chunks or partitions.

Behavior: Always produces multiple chunk files with the naming pattern part_<chunkid>.<extension>

Example output: part_001.parquet, part_002.parquet, part_003.parquet, etc.

note

This is the default write strategy used when no strategy is explicitly specified.

Snapshot write strategy

The snapshot strategy provides flexible output options based on your path specification. This is the only strategy that supports both single file and chunked output modes.

Single file output

When the path ends with a specific filename and extension (e.g., /path/file.parquet):

  • Ascend validates that the file extension matches the formatter's expected extension
  • All data is written to exactly one file
  • Ideal for smaller datasets or when downstream systems expect a single file

Path validation: The system ensures the extension matches the formatter (e.g., .parquet for parquet formatter)

Chunked output

When the path ends with a trailing slash (e.g., /path/):

  • Data is automatically split into multiple chunk files
  • Files are named with the pattern part_<chunkid>.<extension>
  • Default chunk size is 500,000 rows per file

Example configurations:

# Single file output
path: /data/my_snapshot.parquet

# Chunked output
path: /data/my_snapshot/

Partitioned write strategy

The partitioned write strategy updates only modified partitions while preserving existing partitions. This strategy always produces chunked output for optimal performance.

Behavior: Each partition directory contains multiple chunk files named data_<chunkid>.<extension>

Example output structure:

/data/
date=2024-01-01/
data_001.parquet
data_002.parquet
date=2024-01-02/
data_001.parquet
data_002.parquet

File naming conventions

Single file output

<your_specified_path_and_filename>

Example: /data/output/my_snapshot.parquet

Chunked output (snapshot strategy)

<path_prefix>/part_<chunkid>.<extension>

Examples:

  • /data/output/part_001.parquet
  • /data/output/part_002.parquet

Chunked output (partitioned strategy)

<partition_path>/data_<chunkid>.<extension>

Examples:

  • /data/date=2024-01-01/data_001.parquet
  • /data/date=2024-01-01/data_002.parquet

Naming conventions:

  • <chunkid> is a zero-padded sequential identifier (e.g., 001, 002, 003)
  • <extension> matches your specified formatter (parquet, csv, json)
  • Chunk size defaults to 500,000 rows but can be customized via part_file_rows

Next steps

Ready to implement these write strategies? Check out the specific guides for your cloud storage platform: