Project

A Project is a group of related Connections, Flows/Components, Profiles, Vaults, Automations and other code/configuration artifacts. Project files define the mapping of filesystem paths to different kinds of artifacts that the platform can access when running Flows for the Project.

Examples

simple_project_name_description.yaml
custom_project_with_folders.yaml
project_with_version_and_folders.yaml
project_with_defaults.yaml

project:
  name: SimpleProject
  description: A simple project with only a name and description.

project:
  name: CustomProject
  description: A project with custom source and test folders.
  sources:
    - custom_src/
  tests:
    - custom_tests/

project:
  name: MyProject
  description: A project with specific version and multiple connection and flow folders.
  version: "1.0.0"
  connections:
    - connections/folder1/
    - connections/folder2/
  flows:
    - flows/folder1/
    - flows/folder2/

project:
  name: MyProject
  description: A project with default configurations for flows and components.
  version: "1.0.0"
  connections:
    - connections/folder1/
    - connections/folder2/
  flows:
    - flows/folder1/
    - flows/folder2/
  defaults:
    - kind: Flow
      name: "^flow-.*$"
      spec:
        data_plane:
          connection_name: "default-connection"
    - kind: Component
      name: "^component-.*$"
      spec:
        data_plane:
          connection_name: "default-connection"

Project

Below are the properties for the Project. Each property links to the specific details section further down in this page.

Property	Default	Type	Required	Description
project			Yes	Project options.

Property Details

ProjectOptions

Options that can be specified for a Project.

Property	Default	Type	Required	Description
pip_packages		array[string]	No	Python PIP packages to install
parameters		object with property values of type None	No	Dictionary of parameters to use for resource.
defaults		array[None]	No	List of default configs with filters that can be applied to a resource config.
description		string	No	Brief description of what the model does.
metadata			No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name		string	Yes	The name of the model
version		string	No	Project version.
connections	['connections/']	array[string]	No	List of Connection definition folders used in the Project.
flows	['flows/']	array[string]	No	List of Flow definition folders used in the Project.
profiles	['profiles/']	array[string]	No	List of Profile definition folders used in the Project.
sources	['src/']	array[string]	No	List of source definition folders used in the Project.
tests	['tests/']	array[string]	No	List of test definition folders used in the Project.
vaults	['vaults/']	array[string]	No	List of Vault definition folders used in the Project.
actions	['actions/']	array[string]	No	List of Action definition folders used in the Project.
automations	['automations/']	array[string]	No	List of Automation definition folders used in the Project.
sensors	['sensors/']	array[string]	No	List of Sensor definition folders used in the Project.
ssh_tunnels	['ssh_tunnels/']	array[string]	No	List of SSH tunnel definition folders used in the Project.
applications	['applications/']	array[string]	No	List of Application definition folders used in the Project.

ConfigFilter

Filter used to target configuration settings to a specific Flow and/or Component.

Property	Type	Required	Description
kind	string ("Flow", "Component")	Yes	Resource kind to target with this configuration.
name	Any of: string array[string] array[None]	Yes	Name of the resource to target with this configuration.
flow_name	string	No	Name of the Flow to target with this configuration.
spec	Any of:	No	Dictionary of parameters to use for the resource.

ComponentSpec

Specification for configuration applied to a component at runtime based on the config filter.

Property	Type	Required	Description
skip	boolean	No	Boolean flag indicating whether to skip processing for the Component or not.
retry_strategy		No	Retry strategy configuration options for the Component if any exceptions are encountered.
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for Components.

FlowSpec

Specification for configuration applied to a Flow at runtime based on the config filter.

Property	Type	Required	Description
data_plane		No	The data plane that will be used for the flow at runtime.
runner	RunnerConfig	No	Runner configuration.
component_concurrency	integer	No	Maximum number of concurrent Components to run within this Flow.

DataPlane

The external warehouse where data is persisted throughout the Flow runs, and where primary computation on the data itself occurs.

Property	Default	Type	Required	Description
connection_name		string	No
metadata_storage_location_prefix		string	No	Prefix to prepend to the names of metadata tables created for this Flow. The prefix may include database/project/etc. and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the Data Plane's Connection configuration.

RegexFilter

A filter used to target resources based on a regex pattern.

Property	Default	Type	Required	Description
regex		string	Yes	The regex to filter the resources.

RunnerConfig

Configuration for the flow runner

Property	Default	Type	Required	Description
size		RuntimeSize	No	Override the size of the flow runner. If not specified, the flow runner inherits the size from the deployment or workspace.

RuntimeSize

Enumeration of available runtime sizes for deployments, workspaces, and flow runners. Each size corresponds to specific resource allocations (CPU, memory, disk).

No properties defined.

BigQueryDataPlane

Property	Default	Type	Required	Description
bigquery			Yes	BigQuery configuration options.

BigQueryDataPlaneOptions

Property	Default	Type	Required	Description
partition_by		Any of:	No	Partition By clause for the table.
cluster_by		array[string]	No	Clustering keys to be added to the table.

BigQueryRangePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
range			Yes	Range partitioning options.

BigQueryTimePartitioning

Property	Default	Type	Required	Description
field		string	Yes	Field to partition by.
granularity		string ("DAY", "HOUR", "MONTH", "YEAR")	Yes	Granularity of the time partitioning.

DatabricksDataPlane

Property	Default	Type	Required	Description
databricks	cluster_by: null pyspark_job_cluster_id: null table_properties: null		No	Databricks configuration options.

DatabricksDataPlaneOptions

Property	Type	Required	Description
table_properties	object with property values of type string	No	Table properties to include when creating the data table. This setting is equivalent to the `CREATE TABLE ... TBLPROPERTIES` clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane.
pyspark_job_cluster_id	string	No	ID of the compute cluster to use for PySpark jobs.
cluster_by	array[string]	No	Clustering keys to be added to the table.

DuckdbDataPlane

Property	Default	Type	Required	Description
duckdb			No	DuckDB configuration options.

DuckDbDataPlaneOptions

No properties defined.

FabricDataPlane

Property	Default	Type	Required	Description
fabric	spark_session_config: null		No	Fabric configuration options.

FabricDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

RangeOptions

Property	Type	Required	Description
start	integer	Yes	Start of the range partitioning.
end	integer	Yes	End of the range partitioning.
interval	integer	Yes	Interval of the range partitioning.

SnowflakeDataPlane

Property	Default	Type	Required	Description
snowflake			Yes	Snowflake configuration options.

SnowflakeDataPlaneOptions

Property	Default	Type	Required	Description
cluster_by		array[string]	No	Clustering keys to be added to the table.

SynapseDataPlane

Property	Default	Type	Required	Description
synapse	spark_session_config: null		No	Synapse configuration options.

SynapseDataPlaneOptions

Property	Default	Type	Required	Description
spark_session_config			No	Spark session configuration.

LivySparkSessionConfig

Property	Type	Required	Description
pool	string	No	Pool to use for the Spark session.
driver_memory	string	No	Memory to use for the Spark driver.
driver_cores	integer	No	Number of cores to use for the Spark driver.
executor_memory	string	No	Memory to use for the Spark executor.
executor_cores	integer	No	Number of cores to use for each executor.
num_executors	integer	No	Number of executors to use for the Spark session.
session_key_override	string	No	Key to use for the Spark session.
max_concurrent_sessions	integer	No	Maximum number of concurrent sessions allowed for this configuration.

ResourceMetadata

Meta information of a resource. In most cases, it doesn't affect the system behavior but may be helpful to analyze Project resources.

Property	Default	Type	Required	Description
source			No	The origin or source information for the resource.
source_event_uuid		string	No	Event UUID associated with creation of this resource.

ResourceLocation

The origin or source information for the resource.

Property	Default	Type	Required	Description
path		string	Yes	Path within repository files where the resource is defined.
first_line_number		integer	No	First line number within `path` file where the resource is defined.

RetryStrategy

Retry strategy configuration for Component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

Property	Default	Type	Required	Description
stop_after_attempt		integer	No	Number of retry attempts before giving up. If set to None, it will not stop after any number of attempts.
stop_after_delay		integer	No	Maximum time (in seconds) to spend on retries before giving up. If set to None, it will not stop after any time delay.

Examples​

Project​

Property Details​

ProjectOptions​

ConfigFilter​

ComponentSpec​

FlowSpec​

DataPlane​

RegexFilter​

RunnerConfig​

RuntimeSize​

BigQueryDataPlane​

BigQueryDataPlaneOptions​

BigQueryRangePartitioning​

BigQueryTimePartitioning​

DatabricksDataPlane​

DatabricksDataPlaneOptions​

DuckdbDataPlane​

DuckDbDataPlaneOptions​

FabricDataPlane​

FabricDataPlaneOptions​

RangeOptions​

SnowflakeDataPlane​

SnowflakeDataPlaneOptions​

SynapseDataPlane​

SynapseDataPlaneOptions​

LivySparkSessionConfig​

ResourceMetadata​

ResourceLocation​

RetryStrategy​

Examples

Project

Property Details

ProjectOptions

ConfigFilter

ComponentSpec

FlowSpec

DataPlane

RegexFilter

RunnerConfig

RuntimeSize

BigQueryDataPlane

BigQueryDataPlaneOptions

BigQueryRangePartitioning

BigQueryTimePartitioning

DatabricksDataPlane

DatabricksDataPlaneOptions

DuckdbDataPlane

DuckDbDataPlaneOptions

FabricDataPlane

FabricDataPlaneOptions

RangeOptions

SnowflakeDataPlane

SnowflakeDataPlaneOptions

SynapseDataPlane

SynapseDataPlaneOptions

LivySparkSessionConfig

ResourceMetadata

ResourceLocation

RetryStrategy