BigQuery Write Component
A component that writes data to a BigQuery table.
Examples
- bigquery_write_component_config.yaml
- remove_comments_bigquery_table.yaml
- bigquery_write_upsert_config.yaml
component:
write:
connection: my_connection
input:
name: data_input
flow: my_flow
bigquery:
table:
name: my_table
dataset: my_dataset
component:
write:
connection: my-bigquery-connection # The name of the connection to use for writing data.
input:
flow: my_flow # Name of the parent flow that the input component belongs to.
name: my_input_component # The input component name.
strategy:
partitioned:
mode: sync # Specifies the mode to use when writing data in partitions: 'sync' encompasses both 'insert_overwrite' functionality and deletes partitions when deleted at the source.
bigquery:
table:
name: my_table
dataset: my_dataset
component:
write:
bigquery:
table:
name: my_table # Name of the table to be written to.
dataset: my_dataset # Dataset of the table, specific to platforms like BigQuery.
connection: my-bigquery-connection # The name of the connection to use for writing data.
input:
flow: my_flow # Name of the parent flow that the input component belongs to.
name: my_input_component # The input component name.
strategy:
incremental:
mode: upsert # Specifies the mode to use when incrementally writing data.
column: updated_at # Name of the column to use for tracking incremental updates to the data.
change_detection:
unique_key: id_column # Column or set of columns used as a unique identifier for records.
BigQueryWriteComponent
BigQueryWriteComponent
is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the BigQueryWriteComponent
. Each property links to the specific details section further down in this page.
Property | Default | Type | Required | Description |
---|---|---|---|---|
connection | string | Yes | The name of the connection to use for writing data. | |
input | InputComponent | Yes | Input component name. | |
normalize | boolean | No | A boolean flag indicating if the output column names should be normalized to a standard naming convention when writing. | |
preserve_case | boolean | No | A boolean flag indicating if the case of the column names should be preserved when writing. | |
uppercase | boolean | No | A boolean flag indicating if the column names should be transformed to uppercase when writing. | |
strategy | full: mode: drop_and_recreate | Any of: string ("snapshot") FullWriteStrategy IncrementalWriteStrategyWithSchemaChange PartitionedWriteStrategyWithSchemaChange | No | Resource for write strategy. |
bigquery | SingleTableWithDataset | Yes |
Property Details
Component
A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.
Property | Default | Type | Required | Description |
---|---|---|---|---|
component | One of: ReadComponent TransformComponent TaskComponent SingularTestComponent CustomPythonReadComponent WriteComponent CompoundComponent AliasedTableComponent ExternalTableComponent | Yes | Configuration options for the component. |
WriteComponent
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | No | The name of the model | |
description | string | No | A brief description of what the model does. | |
metadata | ResourceMetadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | |
flow_name | string | No | The name of the flow that the component belongs to. | |
skip | boolean | No | A boolean flag indicating whether to skip processing for the component or not. | |
skip_for_time_series_runs | boolean | No | A boolean flag indicating whether to skip processing for this component in time-series runs. | |
write | One of: BigQueryWriteComponent SnowflakeWriteComponent S3WriteComponent MySQLWriteComponent OracleWriteComponent | Yes |
SingleTableWithDataset
Options for reading from a single table in a specific dataset. Useful for platforms like BigQuery.
Property | Default | Type | Required | Description |
---|---|---|---|---|
table | TableWithDatasetOptions | Yes | Table (in specified dataset) to read data from. |
FullWriteStrategy
Container for specifying the incremental write strategy.
Property | Default | Type | Required | Description |
---|---|---|---|---|
full | FullWriteStrategyOptions | Yes | Options to use when fully writing data to a Write component. |
FullWriteStrategyOptions
Resource options for full writes, including mode selection.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | FullWriteModeEnum | Yes | Specifies the mode to use when fully writing data: 'drop_and_recreate' to drop the output table and recreate it. |
FullWriteModeEnum
No properties defined.
IncrementalWriteStrategyWithSchemaChange
Container for specifying the incremental write strategy that supports different behaviors when schema changes.
Property | Default | Type | Required | Description |
---|---|---|---|---|
incremental | IncrementalWriteStrategyOptions | Yes | Options to use when incrementally writing data to a Write component. | |
on_schema_change | string ("ignore", "fail", "drop_and_recreate", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. |
IncrementalWriteStrategyOptions
Resource options for incremental writes, including mode selection and criteria for detecting deletions and unique records.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | IncrementalWriteModeEnum | Yes | Specifies the mode to use when incrementally writing data: 'append' to append new or modified records, 'upsert' to insert new records and update modified records, and 'sync' to encompass both 'upsert' functionality and to delete records when deleted at the source. | |
column | string | Yes | Name of the column to use for tracking incremental updates to the data. | |
change_detection | UniqueKey | No | Options for detecting record changes when comparing source and destination data. | |
deletion_col | string | No | Column name to use for identifying deleted records when 'soft deletion' is used at the source. |
IncrementalWriteModeEnum
No properties defined.
InputComponent
Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.
Property | Default | Type | Required | Description |
---|---|---|---|---|
flow | string | Yes | Name of the parent flow that the input component belongs to. | |
name | string | Yes | The input component name. | |
alias | string | No | The alias to use for the input component. | |
partition_spec | Any of: string ("full_reduction", "map") RepartitionSpec | No | The type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed. | |
where | string | No | An optional filter condition to apply to the input component's data. | |
partition_binding | Any of: string PartitionBinding | No | An optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions. |
PartitionBinding
Property | Default | Type | Required | Description |
---|---|---|---|---|
logical_operator | logical_operator | string ("AND", "OR") | No | The logical operator to use to combine the partition binding predicates provided |
predicates | predicates | array[string] | No | The list of partition binding predicates to apply to the input component's data |
PartitionedWriteStrategyWithSchemaChange
Container for specifying the partitioned write strategy that supports different behaviors when schema changes.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partitioned | PartitionedWriteStrategyOptions | Yes | Options to use when writing data in partitions to a Write component. | |
on_schema_change | string ("ignore", "fail", "drop_and_recreate", "append_new_columns", "sync_all_columns") | No | Policy to apply when schema changes are detected. |
PartitionedWriteStrategyOptions
Resource options for incremental writes, including mode selection and criteria for detecting deletions and unique records.
Property | Default | Type | Required | Description |
---|---|---|---|---|
mode | PartitionedWriteModeEnum | Yes | Specifies the mode to use when writing data in partitions: 'append' to append new or modified partitions, 'insert_overwrite' to insert new partitions and replace/overwrite modified partitions, and 'sync' to encompass both 'insert_overwrite' functionality and to delete partitions when deleted at the source. | |
partition_col | string | No | Column name used for partitioning, uses the internal Ascend partition identifier by default. |
PartitionedWriteModeEnum
No properties defined.
RepartitionSpec
Specification for repartitioning operations on input component's data
Property | Default | Type | Required | Description |
---|---|---|---|---|
repartition | RepartitionOptions | No | Options for repartitioning the input component's data. |
RepartitionOptions
Options for repartitioning the input component's data.
Property | Default | Type | Required | Description |
---|---|---|---|---|
partition_by | string | Yes | The column to partition by. | |
granularity | string | Yes | The granularity to use for the partitioning. |
TableWithDatasetOptions
Options for reading from a specific table in a dataset.
Property | Default | Type | Required | Description |
---|---|---|---|---|
name | string | Yes | Name of the table to be read. | |
dataset | string | No | Dataset of the table, specific to platforms like BigQuery. |
UniqueKey
Property | Default | Type | Required | Description |
---|---|---|---|---|
unique_key | string | Yes | Column or set of columns used as a unique identifier for records. |