Flow

A flow is the primary unit of execution in Ascend and contains a collection of components assembled into a directed acyclic graph (DAG).

Examples

simple_flow_name_description.yaml
flow_with_runner_and_version.yaml
flow_with_specific_data_plane.yaml

flow:
  name: "SimpleFlow"
  description: "This is a simple flow with only a name and description."

flow:
  name: "MyFlow"
  description: "A flow with a specific runner and version."
  runner: "custom-runner"
  version: "1.0.0"

flow:
  name: "ExampleFlow"
  description: "A flow with a specific data plane."
  data_plane:
    connection_name: "specific-connection"

Flow

Below are the properties for the Flow. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
flow			Yes

Property Details

FlowOptions

Defines the options for a Flow

Property	Default	Type	Required	Description
parameters		object with property values of type None	No	Dictionary of parameters to use for resource.
defaults		array[None]	No	List of default configs with filters that can be applied to a resource config.
description		string	No	A brief description of what the model does.
metadata			No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name		string	Yes	The name of the model
data_plane			No	Data plane to use for the flow.
version		string	No	The version of the flow.
bootstrap		string	No	Bootstrap command to run within the Docker container.
runner	ascend	string	No	Runner id to use for running the flow. defaults to 'ascend'

ConfigFilter

A filter used to target configuration settings to a specific flow and/or component.

Property	Type	Required	Description
kind	string ("Flow", "Component")	Yes	The kind of the resource to apply the config to.
name	Any of: string array[string] array[None]	Yes	Name of the resource to apply the config to.
flow_name	string	No	Name of the flow to apply the config to.
spec	Any of:	No	Dictionary of parameters to use for the resource.

ComponentSpec

Specification for configuration applied to a component at runtime based on the config filter.

Property	Type	Required	Description
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for a component.

FlowSpec

Specification for configuration applied to a flow at runtime based on the config filter.

Property	Default	Type	Required	Description
data_plane			No	The data plane that will be used for the flow at runtime.

DataPlane

The external warehouse where data is persisted throughout the flow runs, and where primary computation on the data itself occurs.

Property	Default	Type	Required	Description
connection_name		string	No
metadata_storage_location_prefix		string	No	Prefix to prepend to the names of metadata tables created for this flow. The prefix may include database/project/etc and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the data plane's connection configuration.

RegexFilter

A filter used to target resources based on a regex pattern.

Property	Default	Type	Required	Description
regex		string	Yes	The regex to filter the resources.

BigQueryDataPlane

Property	Default	Type	Required	Description
bigquery			Yes	BigQuery configuration options.

BigQueryDataPlaneOptions

Property	Default	Type	Required	Description
partition_by		Any of:	No	Partition By clause for the table.
cluster_by		array[string]	No	Clustering keys to be added to the table.

BigQueryRangePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
range			Yes	Range partitioning options.

BigQueryTimePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
granularity		string ("DAY", "HOUR", "MONTH", "YEAR")	Yes	Granularity of the time partitioning.

DatabricksDataPlane

Property	Default	Type	Required	Description
databricks	cluster_by: null pyspark_job_cluster_id: null table_properties: null		No	Databricks configuration options.

DatabricksDataPlaneOptions

Property	Type	Required	Description
table_properties	object with property values of type string	No	Table properties to include when creating the data table. This setting is equivalent to the `CREATE TABLE ... TBLPROPERTIES` clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your data plane.
pyspark_job_cluster_id	string	No	The ID of the compute cluster to use for PySpark jobs.
cluster_by	array[string]	No	Clustering keys to be added to the table.

DuckdbDataPlane

Property	Default	Type	Required	Description
duckdb			No	Duckdb configuration options.

DuckDbDataPlaneOptions

No properties defined.

FabricDataPlane

Property	Default	Type	Required	Description
fabric	spark_session_config: null		No	Fabric configuration options.

FabricDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

RangeOptions

Property	Type	Required	Description
start	integer	Yes	Start of the range partitioning.
end	integer	Yes	End of the range partitioning.
interval	integer	Yes	Interval of the range partitioning.

SnowflakeDataPlane

Property	Default	Type	Required	Description
snowflake			Yes	Snowflake configuration options.

SnowflakeDataPlaneOptions

Property	Default	Type	Required	Description
cluster_by		array[string]	No	Clustering keys to be added to the table.

SynapseDataPlane

Property	Default	Type	Required	Description
synapse	spark_session_config: null		No	Synapse configuration options.

SynapseDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

LivySparkSessionConfig

Property	Type	Required	Description
pool	string	No	The pool to use for the Spark session.
driver_memory	string	No	The memory to use for the Spark driver.
driver_cores	integer	No	The number of cores to use for the Spark driver.
executor_memory	string	No	The memory to use for the Spark executor.
executor_cores	integer	No	The number of cores to use for each executor.
num_executors	integer	No	The number of executors to use for the Spark session.
session_key_override	string	No	The key to use for the Spark session.
max_concurrent_sessions	integer	No	The maximum number of concurrent sessions of this spec to create.

ResourceMetadata

Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.

Property	Default	Type	Required	Description
source			No	The origin or source information for the resource.
source_event_uuid		string	No	UUID of the event that is associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

Property	Default	Type	Required	Description
path		string	Yes	Path within repository files where the resource is defined.
first_line_number		integer	No	First line number within `path` file where the resource is defined.

RetryStrategy

Retry strategy configuration for component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

Property	Default	Type	Required	Description
stop_after_attempt		integer	No	The number of attempts before giving up, if None is set, will not stop after any attempts.
stop_after_delay		integer	No	Give up on retries one attempt before you would exceed the delay, if None is set, will not stop after any attempts.

Examples​

Flow​

Property Details​

FlowOptions​

ConfigFilter​

ComponentSpec​

FlowSpec​

DataPlane​

RegexFilter​

BigQueryDataPlane​

BigQueryDataPlaneOptions​

BigQueryRangePartitioning​

BigQueryTimePartitioning​

DatabricksDataPlane​

DatabricksDataPlaneOptions​

DuckdbDataPlane​

DuckDbDataPlaneOptions​

FabricDataPlane​

FabricDataPlaneOptions​

RangeOptions​

SnowflakeDataPlane​

SnowflakeDataPlaneOptions​

SynapseDataPlane​

SynapseDataPlaneOptions​

LivySparkSessionConfig​

ResourceMetadata​

ResourceLocation​

RetryStrategy​

Examples

Flow

Property Details

FlowOptions

ConfigFilter

ComponentSpec

FlowSpec

DataPlane

RegexFilter

BigQueryDataPlane

BigQueryDataPlaneOptions

BigQueryRangePartitioning

BigQueryTimePartitioning

DatabricksDataPlane

DatabricksDataPlaneOptions

DuckdbDataPlane

DuckDbDataPlaneOptions

FabricDataPlane

FabricDataPlaneOptions

RangeOptions

SnowflakeDataPlane

SnowflakeDataPlaneOptions

SynapseDataPlane

SynapseDataPlaneOptions

LivySparkSessionConfig

ResourceMetadata

ResourceLocation

RetryStrategy