Databricks Data Plane Configuration
DatabricksDataPlane
DatabricksDataPlane is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the DatabricksDataPlane. Each property links to the specific details section further down in this page.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| databricks | cluster_by: null pyspark_job_cluster_id: null table_properties: null | No | Databricks configuration options. |
Property Details
BackfillRun
Defines the parameters for a backfill run.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| backfill_run | Yes | Backfill run options. |
BackfillRunOptions
Options for a backfill run.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | Yes | Name of the Flow that is to be backfilled. | |
| start_time | string | Yes | Start time of the time range to be backfilled. | |
| end_time | string | Yes | End time of the time range to be backfilled. | |
| granularity | string ("day", "week", "month") | Yes | Time granularity to use for backfill. Must be one of: 'day', 'week', 'month'. The backfill runner divides the date range into Flow runs of this granularity and launches these Flow runs. | |
| max_concurrent_flow_runs | 1 | integer | No | Maximum number of concurrent Flow runs used for backfill. This is used to limit the number of Flow runners (and hence cluster resources) that are launched simultaneously. |
| backfill_order | string ("forward_chronological", "reverse_chronological") | No | Order to use for backfilling - either forward or reverse chronological order. | |
| flow_run_options | No | Additional options for each Flow run launched during the backfill. | ||
| run_final_sync | boolean | No | Boolean flag indicating whether to run a final sync after concurrent backfill Flow runs. This final sync is a single Flow run that is executed without any time parameters, and is meant to sync the data to the latest state and capture any missing time intervals. |
Component
A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent | Yes | Component configuration options. |
CustomPythonReadComponent
Component that reads data using user-defined custom Python code.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| tests | No | Defines tests to run on this Component's data. | ||
| custom_python_read | Yes |
Flow
A Flow is the primary unit of execution in Ascend and contains a collection of Components assembled into a directed acyclic graph (DAG).
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| flow | Yes |
FlowOptions
Defines the options for a Flow
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| parameters | object with property values of type None | No | Dictionary of parameters to use for resource. | |
| defaults | array[None] | No | List of default configs with filters that can be applied to a resource config. | |
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| data_plane | No | Data plane to use for the flow. | ||
| version | string | No | Flow version. | |
| bootstrap | string | No | Bootstrap command to run within the Docker container. | |
| runner | RunnerConfig | No | Runner configuration. | |
| component_concurrency | integer | No | Maximum number of concurrent Components to run within this Flow. |
FlowRun
Defines the run-specific parameters for a Flow, one flow can have multiple Flow runs
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| flow_run | Yes |
FlowRunBaseOptions
Base options for a Flow Run
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| parameters | object with property values of type None | No | Dictionary of parameters to use for resource. | |
| defaults | array[None] | No | List of default configs with filters that can be applied to a resource config. | |
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| run_tests | True | boolean | No | Boolean flag indicating whether to run tests after processing data. |
| store_test_results | boolean | No | Boolean flag indicating whether to store test results. | |
| components | array[string] | No | List of Component names to run. | |
| component_categories | array[string] | No | List of Component categories to run. | |
| halt_flow_on_error | boolean | No | Boolean flag indicating whether to halt the Flow on error. | |
| disable_optimizers | boolean | No | Boolean flag indicating whether to disable optimizers. | |
| disable_incremental_metadata_collection | boolean | No | Boolean flag indicating whether to disable collection of Incremental Read and Transform Component metadata. | |
| full_refresh | False | boolean | No | Boolean flag indicating whether to perform a full refresh of each Component. ⚠ If true, will drop all internal data and metadata tables/views and re-compute them from scratch. |
| update_materialization_type | False | boolean | No | Boolean flag indicating whether to update Component materialization types (e.g., changing types between 'simple', 'view', 'incremental', and 'smart'). ⚠ If materialization type changes are detected, existing data and metadata tables/views will be dropped and re-computed from scratch. Otherwise, existing data and metadata tables/views will be preserved and type changes will result in an error. |
| runner_overrides | RunnerConfig | No | Override runner configuration for this specific flow run. If not specified, inherits from the flow's runner configuration, or the deployment/workspace defaults. |
FlowRunOptions
Options for a Flow Run
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| parameters | object with property values of type None | No | Dictionary of parameters to use for resource. | |
| defaults | array[None] | No | List of default configs with filters that can be applied to a resource config. | |
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a Flow run. In most cases, it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| run_tests | True | boolean | No | Boolean flag indicating whether to run tests after processing data. |
| store_test_results | boolean | No | Boolean flag indicating whether to store test results. | |
| components | array[string] | No | List of Component names to run. | |
| component_categories | array[string] | No | List of Component categories to run. | |
| halt_flow_on_error | boolean | No | Boolean flag indicating whether to halt the Flow on error. | |
| disable_optimizers | boolean | No | Boolean flag indicating whether to disable optimizers. | |
| disable_incremental_metadata_collection | boolean | No | Boolean flag indicating whether to disable collection of Incremental Read and Transform Component metadata. | |
| full_refresh | False | boolean | No | Boolean flag indicating whether to perform a full refresh of each Component. ⚠ If true, will drop all internal data and metadata tables/views and re-compute them from scratch. |
| update_materialization_type | False | boolean | No | Boolean flag indicating whether to update Component materialization types (e.g., changing types between 'simple', 'view', 'incremental', and 'smart'). ⚠ If materialization type changes are detected, existing data and metadata tables/views will be dropped and re-computed from scratch. Otherwise, existing data and metadata tables/views will be preserved and type changes will result in an error. |
| runner_overrides | RunnerConfig | No | Override runner configuration for this specific flow run. If not specified, inherits from the flow's runner configuration, or the deployment/workspace defaults. | |
| name | string | No | Flow run name. | |
| flow_name | string | Yes | Name of the Flow to run. | |
| event_start_time | string | No | Event start time to be used for time-series processing. | |
| event_end_time | string | No | Event end time to be used for time-series processing. |
Profile
A Profile is a set of configuration options and parameters that define the target where customer code is compiled/run.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| profile | Yes | Options and parameters for Profiles. |
ProfileOptions
Configuration options and parameters for Profiles.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| pip_packages | array[string] | No | Python PIP packages to install | |
| parameters | object with property values of type None | No | Dictionary of parameters to use for resource. | |
| defaults | array[None] | No | List of default configs with filters that can be applied to a resource config. | |
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| ignore | array[string] | No | Additional ignore patterns to apply when using this profile (follows .gitignore syntax) |
Project
A Project is a group of related Connections, Flows/Components, Profiles, Vaults, Automations and other code/configuration artifacts. Project files define the mapping of filesystem paths to different kinds of artifacts that the platform can access when running Flows for the Project.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| project | Yes | Project options. |
ProjectOptions
Options that can be specified for a Project.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| pip_packages | array[string] | No | Python PIP packages to install | |
| parameters | object with property values of type None | No | Dictionary of parameters to use for resource. | |
| defaults | array[None] | No | List of default configs with filters that can be applied to a resource config. | |
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| version | string | No | Project version. | |
| connections | ['connections/'] | array[string] | No | List of Connection definition folders used in the Project. |
| flows | ['flows/'] | array[string] | No | List of Flow definition folders used in the Project. |
| profiles | ['profiles/'] | array[string] | No | List of Profile definition folders used in the Project. |
| sources | ['src/'] | array[string] | No | List of source definition folders used in the Project. |
| tests | ['tests/'] | array[string] | No | List of test definition folders used in the Project. |
| vaults | ['vaults/'] | array[string] | No | List of Vault definition folders used in the Project. |
| actions | ['actions/'] | array[string] | No | List of Action definition folders used in the Project. |
| automations | ['automations/'] | array[string] | No | List of Automation definition folders used in the Project. |
| sensors | ['sensors/'] | array[string] | No | List of Sensor definition folders used in the Project. |
| ssh_tunnels | ['ssh_tunnels/'] | array[string] | No | List of SSH tunnel definition folders used in the Project. |
| applications | ['applications/'] | array[string] | No | List of Application definition folders used in the Project. |
ConfigFilter
Filter used to target configuration settings to a specific Flow and/or Component.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| kind | string ("Flow", "Component") | Yes | Resource kind to target with this configuration. | |
| name | Any of: string array[string] array[None] | Yes | Name of the resource to target with this configuration. | |
| flow_name | string | No | Name of the Flow to target with this configuration. | |
| spec | Any of: | No | Dictionary of parameters to use for the resource. |
ComponentSpec
Specification for configuration applied to a component at runtime based on the config filter.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. |
ReadComponent
Component that reads data from a system.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| tests | No | Defines tests to run on this Component's data. | ||
| read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | Read component that reads data from a system. |
TransformComponent
Component that executes SQL or Python code to transform data.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| tests | No | Defines tests to run on this Component's data. | ||
| transform | One of: SqlTransform PythonTransform SnowparkTransform PySparkTransform | Yes | Transform that executes SQL or Python code for data transformation. |
DatabricksDataPlaneOptions
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| table_properties | object with property values of type string | No | Table properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane. | |
| pyspark_job_cluster_id | string | No | ID of the compute cluster to use for PySpark jobs. | |
| cluster_by | array[string] | No | Clustering keys to be added to the table. |