Skip to main content
Version: 3.0.0

Project

A project is a group of related connections, flows/components, profiles, vaults, automations and other code/configuration artifacts. Project files define the mapping of filesystem paths to different kinds of artifacts that the platform can access when running flows for the project.

Examples​

project:
name: SimpleProject
description: A simple project with only a name and description.

Project​

Below are the properties for the Project. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
projectProjectOptionsYes

Property Details​

ProjectOptions​

Options that can be specified for a project.

PropertyDefaultTypeRequiredDescription
pip_packagesarray[string]
NoPython PIP packages to install
parametersobject with property values of type None
NoDictionary of parameters to use for resource.
defaultsarray[ConfigFilter]
NoList of default configs with filters that can be applied to a resource config.
descriptionstring
NoA brief description of what the model does.
metadataResourceMetadata
NoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model
versionstring
NoThe version of the project.
connections['connections/']array[string]
NoList of connection definition folders used in the project.
flows['flows/']array[string]
NoList of flow definition folders used in the project.
profiles['profiles/']array[string]
NoList of profile definition folders used in the project.
sources['src/']array[string]
NoList of source definition folders used in the project.
tests['tests/']array[string]
NoList of test definition folders used in the project.
vaults['vaults/']array[string]
NoList of vault definition folders used in the project.
actions['actions/']array[string]
NoList of action definition folders used in the project.
automations['automations/']array[string]
NoList of automation definition folders used in the project.
sensors['sensors/']array[string]
NoList of sensor definition folders used in the project.
ssh_tunnels['ssh_tunnels/']array[string]
NoList of SSH tunnel definition folders used in the project.
applications['applications/']array[string]
NoList of Application definition folders used in the project.

ConfigFilter​

A filter used to target configuration settings to a specific flow and/or component.

PropertyDefaultTypeRequiredDescription
kindstring ("Flow", "Component")YesThe kind of the resource to apply the config to.
nameAny of:
  string
  array[string]
  RegexFilter
  array[RegexFilter]
YesName of the resource to apply the config to.
flow_namestring
NoName of the flow to apply the config to.
specAny of:
  FlowSpec
  ComponentSpec
NoDictionary of parameters to use for the resource.

ComponentSpec​

Specification for configuration applied to a component at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    SynapseDataPlane
    FabricDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for a component.
skipFalsebooleanNo

FlowSpec​

Specification for configuration applied to a flow at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
data_planeDataPlaneNoThe data plane that will be used for the flow at runtime.

DataPlane​

The external warehouse where data is persisted throughout the flow runs, and where primary computation on the data itself occurs.

PropertyDefaultTypeRequiredDescription
connection_namestring
No
metadata_storage_location_prefixstring
NoPrefix to prepend to the names of metadata tables created for this flow. The prefix may include database/project/etc and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the data plane's connection configuration.

RegexFilter​

A filter used to target resources based on a regex pattern.

PropertyDefaultTypeRequiredDescription
regexstringYesThe regex to filter the resources.

BigQueryDataPlane​

PropertyDefaultTypeRequiredDescription
bigqueryBigQueryDataPlaneOptionsYesBigQuery configuration options.

BigQueryDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
partition_byAny of:
  BigQueryRangePartitioning
  BigQueryTimePartitioning
NoPartition By clause for the table.
cluster_byarray[string]
NoClustering keys to be added to the table.

BigQueryRangePartitioning​

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
rangeRangeOptionsYesRange partitioning options.

BigQueryTimePartitioning​

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
granularitystring ("DAY", "HOUR", "MONTH", "YEAR")YesGranularity of the time partitioning.

DatabricksDataPlane​

PropertyDefaultTypeRequiredDescription
databrickscluster_by: null
pyspark_job_cluster_id: null
table_properties: null
DatabricksDataPlaneOptionsNoDatabricks configuration options.

DatabricksDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
table_propertiesobject with property values of type string
NoTable properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your data plane.
pyspark_job_cluster_idstring
NoThe ID of the compute cluster to use for PySpark jobs.
cluster_byarray[string]
NoClustering keys to be added to the table.

DuckdbDataPlane​

PropertyDefaultTypeRequiredDescription
duckdb
DuckDbDataPlaneOptionsNoDuckdb configuration options.

DuckDbDataPlaneOptions​

No properties defined.

FabricDataPlane​

PropertyDefaultTypeRequiredDescription
fabricspark_session_config: null
FabricDataPlaneOptionsNoFabric configuration options.

FabricDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
spark_session_configLivySparkSessionConfig
NoSpark session configuration.

RangeOptions​

PropertyDefaultTypeRequiredDescription
startintegerYesStart of the range partitioning.
endintegerYesEnd of the range partitioning.
intervalintegerYesInterval of the range partitioning.

SnowflakeDataPlane​

PropertyDefaultTypeRequiredDescription
snowflakeSnowflakeDataPlaneOptionsYesSnowflake configuration options.

SnowflakeDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
cluster_byarray[string]
NoClustering keys to be added to the table.

SynapseDataPlane​

PropertyDefaultTypeRequiredDescription
synapsespark_session_config: null
SynapseDataPlaneOptionsNoSynapse configuration options.

SynapseDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
spark_session_configLivySparkSessionConfig
NoSpark session configuration.

LivySparkSessionConfig​

PropertyDefaultTypeRequiredDescription
poolstring
NoThe pool to use for the Spark session.
driver_memorystring
NoThe memory to use for the Spark driver.
driver_coresinteger
NoThe number of cores to use for the Spark driver.
executor_memorystring
NoThe memory to use for the Spark executor.
executor_coresinteger
NoThe number of cores to use for each executor.
num_executorsinteger
NoThe number of executors to use for the Spark session.
session_key_overridestring
NoThe key to use for the Spark session.
max_concurrent_sessionsinteger
NoThe maximum number of concurrent sessions of this spec to create.

ResourceMetadata​

Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.

PropertyDefaultTypeRequiredDescription
sourceResourceLocation
NoThe origin or source information for the resource.
source_event_uuidstring
NoUUID of the event that is associated with creation of this resource.

ResourceLocation​

The origin or source information for the resource.

PropertyDefaultTypeRequiredDescription
pathstringYesPath within repository files where the resource is defined.
first_line_numberinteger
NoFirst line number within path file where the resource is defined.