Skip to main content

Profile

A Profile is a set of configuration options and parameters that define the target where customer code is compiled/run.

Examples​

profile:
parameters:
param1: value1
param2: value2

Profile​

Below are the properties for the Profile. Each property links to the specific details section further down in this page.

PropertyDefaultTypeRequiredDescription
profileYesOptions and parameters for Profiles.

Property Details​

ProfileOptions​

Configuration options and parameters for Profiles.

PropertyDefaultTypeRequiredDescription
pip_packagesarray[string]
NoPython PIP packages to install
parametersobject with property values of type None
NoDictionary of parameters to use for resource.
defaultsarray[None]
NoList of default configs with filters that can be applied to a resource config.
descriptionstring
NoBrief description of what the model does.
metadataNoMeta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
namestringYesThe name of the model

ConfigFilter​

Filter used to target configuration settings to a specific Flow and/or Component.

PropertyDefaultTypeRequiredDescription
kindstring ("Flow", "Component")YesResource kind to target with this configuration.
nameAny of:
  string
  array[string]
  array[None]
YesName of the resource to target with this configuration.
flow_namestring
NoName of the Flow to target with this configuration.
specAny of:
NoDictionary of parameters to use for the resource.

ComponentSpec​

Specification for configuration applied to a component at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
skipboolean
NoBoolean flag indicating whether to skip processing for the Component or not.
retry_strategyNoRetry strategy configuration options for the Component if any exceptions are encountered.
data_plane  One of:
    SnowflakeDataPlane
    BigQueryDataPlane
    DuckdbDataPlane
    DatabricksDataPlane
NoData Plane-specific configuration options for Components.

FlowSpec​

Specification for configuration applied to a Flow at runtime based on the config filter.

PropertyDefaultTypeRequiredDescription
data_planeNoThe data plane that will be used for the flow at runtime.

DataPlane​

The external warehouse where data is persisted throughout the Flow runs, and where primary computation on the data itself occurs.

PropertyDefaultTypeRequiredDescription
connection_namestring
No
metadata_storage_location_prefixstring
NoPrefix to prepend to the names of metadata tables created for this Flow. The prefix may include database/project/etc. and schema/dataset/etc where applicable. If not provided, metadata tables are stored alongside the output data tables per the Data Plane's Connection configuration.

RegexFilter​

A filter used to target resources based on a regex pattern.

PropertyDefaultTypeRequiredDescription
regexstringYesThe regex to filter the resources.

BigQueryDataPlane​

PropertyDefaultTypeRequiredDescription
bigqueryYesBigQuery configuration options.

BigQueryDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
partition_byAny of:
NoPartition By clause for the table.
cluster_byarray[string]
NoClustering keys to be added to the table.

BigQueryRangePartitioning​

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
rangeYesRange partitioning options.

BigQueryTimePartitioning​

PropertyDefaultTypeRequiredDescription
fieldstringYesField to partition by.
granularitystring ("DAY", "HOUR", "MONTH", "YEAR")YesGranularity of the time partitioning.

DatabricksDataPlane​

PropertyDefaultTypeRequiredDescription
databrickscluster_by: null
pyspark_job_cluster_id: null
table_properties: null
NoDatabricks configuration options.

DatabricksDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
table_propertiesobject with property values of type string
NoTable properties to include when creating the data table. This setting is equivalent to the CREATE TABLE ... TBLPROPERTIES clause. Please refer to the Databricks documentation at https://docs.databricks.com/aws/en/delta/table-properties for available properties depending on your Data Plane.
pyspark_job_cluster_idstring
NoID of the compute cluster to use for PySpark jobs.
cluster_byarray[string]
NoClustering keys to be added to the table.

DuckdbDataPlane​

PropertyDefaultTypeRequiredDescription
duckdb
NoDuckDB configuration options.

DuckDbDataPlaneOptions​

No properties defined.

FabricDataPlane​

PropertyDefaultTypeRequiredDescription
fabricspark_session_config: null
NoFabric configuration options.

FabricDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
spark_session_configNoSpark session configuration.

RangeOptions​

PropertyDefaultTypeRequiredDescription
startintegerYesStart of the range partitioning.
endintegerYesEnd of the range partitioning.
intervalintegerYesInterval of the range partitioning.

SnowflakeDataPlane​

PropertyDefaultTypeRequiredDescription
snowflakeYesSnowflake configuration options.

SnowflakeDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
cluster_byarray[string]
NoClustering keys to be added to the table.

SynapseDataPlane​

PropertyDefaultTypeRequiredDescription
synapsespark_session_config: null
NoSynapse configuration options.

SynapseDataPlaneOptions​

PropertyDefaultTypeRequiredDescription
spark_session_configNoSpark session configuration.

LivySparkSessionConfig​

PropertyDefaultTypeRequiredDescription
poolstring
NoPool to use for the Spark session.
driver_memorystring
NoMemory to use for the Spark driver.
driver_coresinteger
NoNumber of cores to use for the Spark driver.
executor_memorystring
NoMemory to use for the Spark executor.
executor_coresinteger
NoNumber of cores to use for each executor.
num_executorsinteger
NoNumber of executors to use for the Spark session.
session_key_overridestring
NoKey to use for the Spark session.
max_concurrent_sessionsinteger
NoMaximum number of concurrent sessions allowed for this configuration.

ResourceMetadata​

Meta information of a resource. In most cases, it doesn't affect the system behavior but may be helpful to analyze Project resources.

PropertyDefaultTypeRequiredDescription
sourceNoThe origin or source information for the resource.
source_event_uuidstring
NoEvent UUID associated with creation of this resource.

ResourceLocation​

The origin or source information for the resource.

PropertyDefaultTypeRequiredDescription
pathstringYesPath within repository files where the resource is defined.
first_line_numberinteger
NoFirst line number within path file where the resource is defined.

RetryStrategy​

Retry strategy configuration for Component operations. This configuration leverages the tenacity library to implement robust retry mechanisms. The configuration options directly map to tenacity's retry parameters. Details on the tenacity library can be found here: https://tenacity.readthedocs.io/en/latest/api.html#retry-main-api Current implementation includes: - stop_after_attempt: Maximum number of retry attempts - stop_after_delay: Give up on retries one attempt before you would exceed the delay. Will need to supply at least one of the two parameters. Additional retry parameters will be added as needed to support more complex use cases.

PropertyDefaultTypeRequiredDescription
stop_after_attemptinteger
NoNumber of retry attempts before giving up. If set to None, it will not stop after any number of attempts.
stop_after_delayinteger
NoMaximum time (in seconds) to spend on retries before giving up. If set to None, it will not stop after any time delay.