Write output formats
When writing data to blob storage systems, Ascend provides enhanced write strategies that optimize for different use cases and data sizes.
Ascend supports three primary write strategies, each optimized for different use cases:
Full write strategy
The full write strategy replaces all data in the target location during each flow run, ensuring a complete refresh. This strategy treats the data as a single logical unit rather than operating with awareness of chunks or partitions
Behavior: Always produces multiple chunk files with the naming pattern part_<chunkid>.<extension>
Example output: part_001.parquet
, part_002.parquet
, part_003.parquet
, etc.
Snapshot write strategy
The snapshot strategy provides flexible output options based on your path specification. This is the only strategy that supports both single file and chunked output modes.
Single file output
When the path ends with a specific filename and extension (e.g., /path/file.parquet
):
- Ascend validates that the file extension matches the formatter's expected extension
- All data is written to exactly one file
- Ideal for smaller datasets or when downstream systems expect a single file
Path validation: The system ensures the extension matches the formatter (e.g., .parquet
for parquet formatter)
Chunked output
When the path ends with a trailing slash (e.g., /path/
):
- Data is automatically split into multiple chunk files
- Files are named with the pattern
part_<chunkid>.<extension>
- Default chunk size is 500,000 rows per file
Example configurations:
# Single file output
path: /data/my_snapshot.parquet
# Chunked output
path: /data/my_snapshot/
Partitioned write strategy
The partitioned write strategy updates only modified partitions while preserving existing partitions. This strategy always produces chunked output for optimal performance.
Behavior: Each partition directory contains multiple chunk files named data_<chunkid>.<extension>
Example output structure:
/data/
date=2024-01-01/
data_001.parquet
data_002.parquet
date=2024-01-02/
data_001.parquet
data_002.parquet
File naming conventions
Single file output
<your_specified_path_and_filename>
Example: /data/output/my_snapshot.parquet
Chunked output (snapshot strategy)
<path_prefix>/part_<chunkid>.<extension>
Examples:
/data/output/part_001.parquet
/data/output/part_002.parquet
Chunked output (partitioned strategy)
<partition_path>/data_<chunkid>.<extension>
Examples:
/data/date=2024-01-01/data_001.parquet
/data/date=2024-01-01/data_002.parquet
Naming conventions:
<chunkid>
is a zero-padded sequential identifier (e.g., 001, 002, 003)<extension>
matches your specified formatter (parquet, csv, json)- Chunk size defaults to 500,000 rows but can be customized via part_file_rows
Next steps
Ready to implement these write strategies? Check out the specific guides for your cloud storage platform: