Project
A project is a group of related connections, flows/components, profiles, vaults, automations and other code/configuration artifacts. Project files define the mapping of filesystem paths to different kinds of artifacts that the platform can access when running flows for the project.
Examples
- simple_project_name_description.yaml
- custom_project_with_folders.yaml
- project_with_version_and_folders.yaml
- project_with_defaults.yaml
project:
name: SimpleProject
description: A simple project with only a name and description.
project:
name: CustomProject
description: A project with custom source and test folders.
sources:
- custom_src/
tests:
- custom_tests/
project:
name: MyProject
description: A project with specific version and multiple connection and flow folders.
version: "1.0.0"
connections:
- connections/folder1/
- connections/folder2/
flows:
- flows/folder1/
- flows/folder2/
project:
name: MyProject
description: A project with default configurations for flows and components.
version: "1.0.0"
connections:
- connections/folder1/
- connections/folder2/
flows:
- flows/folder1/
- flows/folder2/
defaults:
- kind: Flow
name: "^flow-.*$"
spec:
data_plane:
connection_name: "default-connection"
- kind: Component
name: "^component-.*$"
spec:
data_plane:
connection_name: "default-connection"
Project
Below are the properties for the Project
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
project | ProjectOptions | Yes |
Property Details
ProjectOptions
Options that can be specified for a project.
Property | Default | Type | Required | Description |
---|---|---|---|---|
pip_packages | array[string] | No | Python PIP packages to install | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
parameters | object | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. | |
version | string | No | The version of the project. | |
connections | ['connections/'] | array[string] | No | List of connection definition folders used in the project. |
flows | ['flows/'] | array[string] | No | List of flow definition folders used in the project. |
profiles | ['profiles/'] | array[string] | No | List of profile definition folders used in the project. |
sources | ['src/'] | array[string] | No | List of source definition folders used in the project. |
tests | ['tests/'] | array[string] | No | List of test definition folders used in the project. |
vaults | ['vaults/'] | array[string] | No | List of vault definition folders used in the project. |
actions | ['actions/'] | array[string] | No | List of action definition folders used in the project. |
automations | ['automations/'] | array[string] | No | List of automation definition folders used in the project. |
sensors | ['sensors/'] | array[string] | No | List of sensor definition folders used in the project. |
ssh_tunnels | ['ssh_tunnels/'] | array[string] | No | List of SSH tunnel definition folders used in the project. |
applications | ['applications/'] | array[string] | No | List of Application definition folders used in the project. |
ConfigFilter
A filter used to target configuration settings to a specific flow and/or component.
Property | Default | Type | Required | Description |
---|---|---|---|---|
kind | string ("Flow", "Component") | Yes | The kind of the resource to apply the config to. | |
name | Any of: string array[string] RegexFilter array[RegexFilter] | Yes | Name of the resource to apply the config to. | |
flow_name | string | No | Name of the flow to apply the config to. | |
spec | Any of: FlowSpec ComponentSpec | No | Dictionary of parameters to use for the resource. |
ComponentSpec
Specification for configuration applied to a component at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
skip | False | boolean | No |
FlowSpec
Specification for configuration applied to a flow at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | DataPlane | No | The data plane that will be used for the flow at runtime. |
DataPlane
The external warehouse where data is persisted throughout the flow runs, and where primary computation on the data itself occurs.
Property | Default | Type | Required | Description |
---|---|---|---|---|
connection_name | string | No | ||
metadata_storage_location_prefix | string | No | Prefix to prepend to the names of metadata tables created for this flow. The prefix may include database/project/etc and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the data plane's connection configuration. |
RegexFilter
A filter used to target resources based on a regex pattern.
Property | Default | Type | Required | Description |
---|---|---|---|---|
regex | string | Yes | The regex to filter the resources. |
BigQueryDataPlane
Property | Default | Type | Required | Description |
---|---|---|---|---|
bigquery | BigQueryDataPlaneOptions | Yes | BigQuery configuration options. |
BigQueryDataPlaneOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | Any of: BigQueryRangePartitioning BigQueryTimePartitioning | No | Partition By clause for the table. | |
cluster_by | array[string] | No | Clustering keys to be added to the table. |
BigQueryRangePartitioning
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
range | RangeOptions | Yes | Range partitioning options. |
BigQueryTimePartitioning
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
granularity | string ("DAY", "HOUR", "MONTH", "YEAR") | Yes | Granularity of the time partitioning. |
DuckdbDataPlane
Property | Default | Type | Required | Description |
---|---|---|---|---|
duckdb | DuckDbDataPlaneOptions | No | Duckdb configuration options. |
DuckDbDataPlaneOptions
No properties defined.
RangeOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
start | integer | Yes | Start of the range partitioning. | |
end | integer | Yes | End of the range partitioning. | |
interval | integer | Yes | Interval of the range partitioning. |
SnowflakeDataPlane
Property | Default | Type | Required | Description |
---|---|---|---|---|
snowflake | SnowflakeDataPlaneOptions | Yes | Snowflake configuration options. |
SnowflakeDataPlaneOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
cluster_by | array[string] | No | Clustering keys to be added to the table. |
SynapseDataPlane
Property | Default | Type | Required | Description |
---|---|---|---|---|
synapse | spark_session_config: null | SynapseDataPlaneOptions | No | Synapse configuration options. |
SynapseDataPlaneOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
spark_session_config | LivySparkSessionConfig | No | Spark session configuration. |
LivySparkSessionConfig
Property | Default | Type | Required | Description |
---|---|---|---|---|
pool | string | No | The pool to use for the Spark session. | |
driver_memory | string | No | The memory to use for the Spark driver. | |
driver_cores | integer | No | The number of cores to use for the Spark driver. | |
executor_memory | string | No | The memory to use for the Spark executor. | |
executor_cores | integer | No | The number of cores to use for each executor. | |
num_executors | integer | No | The number of executors to use for the Spark session. | |
session_key_override | string | No | The key to use for the Spark session. |
ResourceMetadata
Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
Property | Default | Type | Required | Description |
---|---|---|---|---|
source | ResourceLocation | No | The origin or source information for the resource. | |
source_event_uuid | string | No | UUID of the event that is associated with creation of this resource. |
ResourceLocation
The origin or source information for the resource.
Property | Default | Type | Required | Description |
---|---|---|---|---|
path | string | Yes | Path within repository files where the resource is defined. | |
first_line_number | integer | No | First line number within path file where the resource is defined. |