BigQuery Data Plane Configuration
BigQueryDataPlane
BigQueryDataPlane
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the BigQueryDataPlane
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
bigquery | BigQueryDataPlaneOptions | Yes | BigQuery configuration options. |
Property Details
BackfillRun
Defines the parameters for a backfill run.
Property | Default | Type | Required | Description |
---|---|---|---|---|
backfill_run | BackfillRunOptions | Yes | Backfill run options. |
BackfillRunOptions
Options for a backfill run.
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | Yes | The name of the flow that is to be backfilled. | |
start_time | string | Yes | Start time of the time range to be backfilled. | |
end_time | string | Yes | End time of the time range to be backfilled. | |
granularity | string ("day", "week", "month") | Yes | The time granularity to use for backfill. Must be one of: 'day', 'week', 'month'. The backfill runner will divide the date range into flow runs of this granularity and launch these flow runs. | |
max_concurrent_flow_runs | integer | No | The maximum number of concurrent flow runs used for backfill. This is used to limit the number of flow runners (and hence cluster resources) that are launched at once. | |
backfill_order | string ("forward_chronological", "reverse_chronological") | No | The order to use for backfilling - either forward or reverse chronological order. | |
flow_run_options | FlowRunBaseOptions | No | Additional options for each flow run launched during the backfill. | |
run_final_sync | boolean | No | A boolean flag indicating whether to run a final sync after the running concurrent backfill flow runs. This final sync is a single flow run that is executed without any time parameters, and is meant to sync the data to the latest state and capture any missing time intervals. |
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
CustomPythonReadComponent
A component that reads data using user-defined, custom Python code.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
tests | ComponentTestColumn | No | Defines tests to run on the data of this component. | |
custom_python_read | CustomPythonReadOptions | Yes |
Flow
A flow is the primary unit of execution in Ascend and contains a collection of components assembled into a directed acyclic graph (DAG).
Property | Default | Type | Required | Description |
---|---|---|---|---|
flow | FlowOptions | Yes |
FlowOptions
Defines the options for a Flow
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
parameters | object | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. | |
data_plane | DataPlane | No | Data plane to use for the flow. | |
version | string | No | The version of the flow. | |
bootstrap | string | No | Bootstrap command to run within the Docker container. | |
runner | ascend | string | No | Runner id to use for running the flow. defaults to 'ascend' |
FlowRun
Defines the run-specific parameters for a Flow, one flow can have multiple Flow runs
Property | Default | Type | Required | Description |
---|---|---|---|---|
flow_run | FlowRunOptions | Yes |
FlowRunBaseOptions
Base options for a Flow Run
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
parameters | object | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. | |
run_tests | True | boolean | No | A boolean flag indicating whether to run tests after processing the data. |
store_test_results | boolean | No | A boolean flag indicating whether to store the test results. | |
components | array[string] | No | List of component names to run. | |
component_categories | array[string] | No | List of component categories to run. | |
halt_flow_on_error | boolean | No | A boolean flag indicating whether to halt the flow on error. | |
disable_optimizers | boolean | No | A boolean flag indicating whether to disable optimizers. | |
disable_incremental_metadata_collection | boolean | No | A boolean flag indicating whether to disable collection incremental RC/Transform metadata. |
FlowRunOptions
Options for a Flow Run
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | No | The name of the FlowRun. | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
parameters | object | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. | |
run_tests | True | boolean | No | A boolean flag indicating whether to run tests after processing the data. |
store_test_results | boolean | No | A boolean flag indicating whether to store the test results. | |
components | array[string] | No | List of component names to run. | |
component_categories | array[string] | No | List of component categories to run. | |
halt_flow_on_error | boolean | No | A boolean flag indicating whether to halt the flow on error. | |
disable_optimizers | boolean | No | A boolean flag indicating whether to disable optimizers. | |
disable_incremental_metadata_collection | boolean | No | A boolean flag indicating whether to disable collection incremental RC/Transform metadata. | |
flow_name | string | Yes | The name of the flow that is to be run. | |
event_start_time | string | No | Event start time to be used for time-series processing. | |
event_end_time | string | No | Event end time to be used for time-series processing. |
Profile
A profile is a set of configuration options (and parameters) that define the target where customer code is compiled/run.
Property | Default | Type | Required | Description |
---|---|---|---|---|
profile | ProfileOptions | Yes | Options (and parameters) for the profile. |
ProfileOptions
Configuration options (and parameters) for a profile.
Property | Default | Type | Required | Description |
---|---|---|---|---|
pip_packages | array[string] | No | Python PIP packages to install | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
parameters | object | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. |
Project
A project is a group of related connections, flows/components, profiles, vaults, automations and other code/configuration artifacts. Project files define the mapping of filesystem paths to different kinds of artifacts that the platform can access when running flows for the project.
Property | Default | Type | Required | Description |
---|---|---|---|---|
project | ProjectOptions | Yes |
ProjectOptions
Options that can be specified for a project.
Property | Default | Type | Required | Description |
---|---|---|---|---|
pip_packages | array[string] | No | Python PIP packages to install | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
parameters | object | No | Dictionary of parameters to use for resource. | |
defaults | array[ConfigFilter] | No | List of default configs with filters that can be applied to a resource config. | |
version | string | No | The version of the project. | |
connections | ['connections/'] | array[string] | No | List of connection definition folders used in the project. |
flows | ['flows/'] | array[string] | No | List of flow definition folders used in the project. |
profiles | ['profiles/'] | array[string] | No | List of profile definition folders used in the project. |
sources | ['src/'] | array[string] | No | List of source definition folders used in the project. |
tests | ['tests/'] | array[string] | No | List of test definition folders used in the project. |
vaults | ['vaults/'] | array[string] | No | List of vault definition folders used in the project. |
actions | ['actions/'] | array[string] | No | List of action definition folders used in the project. |
automations | ['automations/'] | array[string] | No | List of automation definition folders used in the project. |
sensors | ['sensors/'] | array[string] | No | List of sensor definition folders used in the project. |
ssh_tunnels | ['ssh_tunnels/'] | array[string] | No | List of SSH tunnel definition folders used in the project. |
applications | ['applications/'] | array[string] | No | List of Application definition folders used in the project. |
ConfigFilter
A filter used to target configuration settings to a specific flow and/or component.
Property | Default | Type | Required | Description |
---|---|---|---|---|
kind | string ("Flow", "Component") | Yes | The kind of the resource to apply the config to. | |
name | Any of: string array[string] RegexFilter array[RegexFilter] | Yes | Name of the resource to apply the config to. | |
flow_name | string | No | Name of the flow to apply the config to. | |
spec | Any of: FlowSpec ComponentSpec | No | Dictionary of parameters to use for the resource. |
ComponentSpec
Specification for configuration applied to a component at runtime based on the config filter.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
skip | False | boolean | No |
ReadComponent
A component that reads data from a data system.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
tests | ComponentTestColumn | No | Defines tests to run on the data of this component. | |
read | One of: GenericFileReadComponent LocalFileReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent | Yes | The read component that reads data from a data system. |
TransformComponent
A component that executes SQL or Python code to transform data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane SynapseDataPlane | No | Data Plane-specific configuration options for a component. | |
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
data_maintenance | DataMaintenance | No | The data maintenance configuration options for the component. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
tests | ComponentTestColumn | No | Defines tests to run on the data of this component. | |
transform | One of: SqlTransform PythonTransform SnowparkTransform PySparkTransform | Yes | The transform component that executes SQL or Python code to transform data. |
BigQueryDataPlaneOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | Any of: BigQueryRangePartitioning BigQueryTimePartitioning | No | Partition By clause for the table. | |
cluster_by | array[string] | No | Clustering keys to be added to the table. |
BigQueryRangePartitioning
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
range | RangeOptions | Yes | Range partitioning options. |
BigQueryTimePartitioning
Property | Default | Type | Required | Description |
---|---|---|---|---|
field | string | Yes | Field to partition by. | |
granularity | string ("DAY", "HOUR", "MONTH", "YEAR") | Yes | Granularity of the time partitioning. |
RangeOptions
Property | Default | Type | Required | Description |
---|---|---|---|---|
start | integer | Yes | Start of the range partitioning. | |
end | integer | Yes | End of the range partitioning. | |
interval | integer | Yes | Interval of the range partitioning. |