Write strategies
Choose the right write strategy to optimize how Ascend writes data to your blob storage systems. Each strategy is designed for specific use cases, from complete data refreshes to incremental updates and single-file outputs.
Ascend supports three primary write strategies:
Full write strategy (default)
The full write strategy replaces all data in the target location during each Flow run, ensuring a complete refresh. This strategy treats the data as a single logical unit rather than operating with awareness of chunks or partitions.
Behavior: Always produces multiple chunk files with the naming pattern part_<chunkid>.<extension>
Example output: part_001.parquet
, part_002.parquet
, part_003.parquet
, etc.
This is the default write strategy used when no strategy is explicitly specified.
Snapshot write strategy
The snapshot strategy provides flexible output options based on your path specification. This is the only strategy that supports both single file and chunked output modes.
Single file output
When the path ends with a specific filename and extension (e.g., /path/file.parquet
):
- Ascend validates that the file extension matches the formatter's expected extension
- All data is written to exactly one file
- Ideal for smaller datasets or when downstream systems expect a single file
Path validation: The system ensures the extension matches the formatter (e.g., .parquet
for parquet formatter)
Chunked output
When the path ends with a trailing slash (e.g., /path/
):
- Data is automatically split into multiple chunk files
- Files are named with the pattern
part_<chunkid>.<extension>
- Default chunk size is 500,000 rows per file
Example configurations:
# Single file output
path: /data/my_snapshot.parquet
# Chunked output
path: /data/my_snapshot/
Partitioned write strategy
The partitioned write strategy updates only modified partitions while preserving existing partitions. This strategy always produces chunked output for optimal performance.
Behavior: Each partition directory contains multiple chunk files named data_<chunkid>.<extension>
Example output structure:
/data/
date=2024-01-01/
data_001.parquet
data_002.parquet
date=2024-01-02/
data_001.parquet
data_002.parquet
File naming conventions
Single file output
<your_specified_path_and_filename>
Example: /data/output/my_snapshot.parquet
Chunked output (snapshot strategy)
<path_prefix>/part_<chunkid>.<extension>
Examples:
/data/output/part_001.parquet
/data/output/part_002.parquet
Chunked output (partitioned strategy)
<partition_path>/data_<chunkid>.<extension>
Examples:
/data/date=2024-01-01/data_001.parquet
/data/date=2024-01-01/data_002.parquet
Naming conventions:
<chunkid>
is a zero-padded sequential identifier (e.g., 001, 002, 003)<extension>
matches your specified formatter (parquet, csv, json)- Chunk size defaults to 500,000 rows but can be customized via part_file_rows
Next steps
Ready to implement these write strategies? Check out the specific guides for your cloud storage platform: