A Project is a group of related Connections, Flows/Components, Profiles, Vaults, Automations and other code/configuration artifacts.
Project files define the mapping of filesystem paths to different kinds of artifacts that the platform can access when running Flows for the Project.
project: name: SimpleProject description: A simple project with only a name and description.
project: name: CustomProject description: A project with custom source and test folders. sources: - custom_src/ tests: - custom_tests/
project: name: MyProject description: A project with specific version and multiple connection and flow folders. version:"1.0.0" connections: - connections/folder1/ - connections/folder2/ flows: - flows/folder1/ - flows/folder2/
project: name: MyProject description: A project with default configurations for flows and components. version:"1.0.0" connections: - connections/folder1/ - connections/folder2/ flows: - flows/folder1/ - flows/folder2/ defaults: -kind: Flow name:"^flow-.*$" spec: data_plane:
The external warehouse where data is persisted throughout the Flow runs, and where primary computation on the data itself occurs.
Property
Default
Type
Required
Description
connection_name
string
No
metadata_storage_location_prefix
string
No
Prefix to prepend to the names of metadata tables created for this Flow. The prefix may include database/project/etc. and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the Data Plane's Connection configuration.
Table properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane.
pyspark_job_cluster_id
string
No
ID of the compute cluster to use for PySpark jobs.
Retry strategy configuration for Component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.
Property
Default
Type
Required
Description
stop_after_attempt
integer
No
Number of retry attempts before giving up. If set to None, it will not stop after any number of attempts.
stop_after_delay
integer
No
Maximum time (in seconds) to spend on retries before giving up. If set to None, it will not stop after any time delay.