BigQuery Read Component

A component that reads data from a BigQuery table.

Examples

bigquery_read_component.yaml
bigquery_read_component_multiple_queries.yaml
bigquery_read_multiple_tables.yaml

component:
  read:
    bigquery:
      table:
        name: my_table
        dataset: my_dataset
    connection: my-bigquery-connection
    preserve_case: true

component:
  read:
    connection: my-bigquery-connection
    bigquery:
      queries:
        - SELECT * FROM dataset1.table1
        - SELECT * FROM dataset2.table2

component:
  read:
    connection: my-bigquery-connection
    bigquery:
      tables:
        - name: table1
          dataset: my_dataset
        - name: table2
          dataset: my_dataset

BigQueryReadComponent

info

BigQueryReadComponent is defined beneath the following ancestor nodes in the YAML structure:

Component
ReadComponent

Below are the properties for the BigQueryReadComponent. Each property links to the specific details section further down in this page.

Property	Type	Required	Description
dependencies	array[None]	No	List of dependencies that must complete before this component runs.
event_time	string	No	Timestamp column in the component output used to represent event time.
connection	string	No	The name of the connection to use for reading data.
columns	array[None]	No	A list specifying the columns to read from the source and transformations to make during read.
normalize	boolean	No	A boolean flag indicating if the output column names should be normalized to a standard naming convention after reading.
preserve_case	boolean	No	A boolean flag indicating if the case of the column names should be preserved after reading.
uppercase	boolean	No	A boolean flag indicating if the column names should be transformed to uppercase after reading.
strategy	Any of: full IncrementalReadStrategy PartitionedStrategy	No	Ingest strategy options.
read_options		No	Options for reading from the database or warehouse.
bigquery	Any of:	Yes

Property Details

Component

A component is a fundamental building block of a data flow. Types of components that are supported include: read, transform, task, test, and more.

Property	Default	Type	Required	Description
component		One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent	Yes	Configuration options for the component.

ReadComponent

A component that reads data from a data system.

Property	Type	Required	Description
data_plane	One of: SnowflakeDataPlane BigQueryDataPlane DatabricksDataPlane	No	Data Plane-specific configuration options for a component.
skip	boolean	No	A boolean flag indicating whether to skip processing for the component or not.
retry_strategy		No	The retry strategy configuration options for the component if any exceptions are encountered.
description	string	No	A brief description of what the model does.
metadata		No	Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources.
name	string	Yes	The name of the model
flow_name	string	No	The name of the flow that the component belongs to.
data_maintenance		No	The data maintenance configuration options for the component.
tests		No	Defines tests to run on the data of this component.
read	One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent	Yes	The read component that reads data from a data system.

MultipleTablesWithDataset

Options for reading from multiple tables in a specific dataset. Useful for platforms like BigQuery.

Property	Default	Type	Required	Description
tables		array[None]	Yes	List of tables (in specified datasets) to read data from.

ComponentColumn

Component column expression definition.

No properties defined.

DatabaseReadOptions

Options for reading from a database or warehouse.

Property	Default	Type	Required	Description
chunk_size	100000	integer	No	Number of rows to read from the table at a time.
parallel_read			No	Options for reading from the source in parallel.

IncrementalReadStrategy

Incremental Read Strategy for database read components - this is a combination of the replication strategy that defines how new data is read from the source, and the incremental strategy that defines how this new data is materialized in the output.

Property	Type	Required	Description
replication	One of: Any of: cdc Any of: incremental	No	Replication strategy to use for data synchronization.
incremental	Any of: append MergeStrategy	Yes	Incremental processing strategy.
on_schema_change	string ("ignore", "fail", "append_new_columns", "sync_all_columns")	No	Policy to apply when schema changes are detected. Defaults to 'fail' if not provided.

CdcReplication

Specifies if Change Data Capture (CDC) is the replication strategy.

Property	Default	Type	Required	Description
cdc			No	Resource for Change Data Capture (CDC), enabling incremental data capture based on changes.

CdcOptions

No properties defined.

IncrementalReplication

Specifies if incremental data reading is the replication strategy.

Property	Default	Type	Required	Description
incremental			No	Resource for incremental data reading based on a specific column.

IncrementalColumn

Specifies the column to be used for incremental reading.

Property	Default	Type	Required	Description
column_name		string	Yes	Name of the column to use for tracking incremental updates to the data.
start_value		Any of: string integer number string	No	Initial value to start reading data from the specified column.

MultipleQueries

Options to define one or more arbitrary select statements. The output of the queries will be unioned together, and must return the same database schema.

Property	Default	Type	Required	Description
queries		array[string]	No	List of SQL queries to execute for reading data.

ParallelReadOptions

Options for reading from a source in parallel. Ascend will logically partition the source data based on the partition column and max partitions, and then read each partition in parallel. If lower and upper bounds for the partition column are provided, they are used as hints to guide partitioning by dividing the range into max_partitions roughly equal partitions.

Property	Default	Type	Required	Description
partition_column		string	Yes	The name of the column to partition the data by. Select a column that could be used to partition the data into smaller chunks and that results in the lowest skew between partitions in terms of record count. This column must either be an integer or timestamp column.
max_partitions	-1	integer	No	The maximum number of partitions to read concurrently from the source. Since this translates directly to the number of concurrent connections to the source, care should be taken to select a value that does not exceed the source system's connection or other resource limits. A value of -1 means that the value is chosen automatically based on the source.
partition_lower_bound		Any of: integer string string	No	The lower bound of the partition column. If not provided, the minimum value of the partition column will be used.
partition_upper_bound		Any of: integer string string	No	The upper bound of the partition column. If not provided, the maximum value of the partition column will be used.

SingleQuery

Options to define an arbitrary select statement.

Property	Default	Type	Required	Description
query		string	No	SQL query to execute for reading data.

PartitionedStrategy

Partitioned Ingest Strategy. The user is expected to provide 2 functions, a list function that lists partitions in the source, and a read function that reads a partition from the source.

Property	Default	Type	Required	Description
partitioned			No	Options for partitioning data.
on_schema_change		string ("ignore", "fail", "append_new_columns", "sync_all_columns")	No	Policy to apply when schema changes are detected. Defaults to 'fail' if not provided.

PartitionedOptions

Options related to partition optimization - in particular, the policy that determines which partitions to ingest.

Property	Default	Type	Required	Description
enable_substitution_by_partition_name		boolean	Yes	Enable substitution by partition name.
output_type	table	string ("table", "view")	No	Output type for partitioned data. Must be either 'table' or 'view'. This strategy applies only to Transforms.

SCDType2Strategy

The SCD Type 2 strategy allows users to track changes to records over time, by tracking the start and end times for each version of a record. A brief overview of the strategy can be found at https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row.

Property	Default	Type	Required	Description
scd_type_2			No	Options for SCD Type 2 strategy.

SingleTableWithDataset

Options for reading from a single table in a specific dataset. Useful for platforms like BigQuery.

Property	Default	Type	Required	Description
table		Any of:	Yes	Table (in specified dataset) to read data from.

PartitionedTableWithDatasetOptions

Options for reading from a partitioned table in a dataset.

Property	Type	Required	Description
partitioning	Any of: time	Yes	Describes how the source table is partitioned.
name	string	Yes	Name of the table to be read.
dataset	string	No	Dataset of the table, specific to platforms like BigQuery.

InputComponent

Specification for input components, including how partitioning behaviors should be handled. This additional metadata is required when a component is used as an input to other components in a flow.

Property	Type	Required	Description
flow	string	Yes	Name of the parent flow that the input component belongs to.
name	string	Yes	The input component name.
alias	string	No	The alias to use for the input component.
partition_spec	Any of: string ("full_reduction", "map")	No	The type of partitioning to apply to the component's input data before processing the component's logic. Input partitioning is applied before the component's logic is executed.
where	string	No	An optional filter condition to apply to the input component's data.
partition_binding	Any of: string	No	An optional partition binding specification to apply to the component on a per-output-partition basis against other inputs' partitions.

MergeStrategy

A strategy that involves merging new data with existing data by updating existing records that match the unique key.

Property	Default	Type	Required	Description
merge			No	Options for merge strategy.

KeyOptions

Column options needed for merge and SCD Type 2 strategies, such as unique key and deletion column name.

Property	Type	Required	Description
unique_key	string	Yes	Column or comma-separated set of columns used as a unique identifier for records, aiding in the merge process.
deletion_column	string	No	Column name used in the upstream source for soft-deleting records. Used when replicating data from a source that supports soft-deletion. If provided, the merge strategy will be able to detect deletions and mark them as deleted in the destination. If not provided, the merge strategy will not be able to detect deletions.
merge_update_columns	Any of: string array[string]	No	List of columns to include when updating values in merge. These columns are mutually exclusive with respect to the columns in `merge_exclude_columns`.
merge_exclude_columns	Any of: string array[string]	No	List of columns to exclude when updating values in merge. These columns are mutually exclusive with respect to the columns in `merge_update_columns`.
incremental_predicates	Any of: string array[string]	No	List of conditions to filter incremental data.

PartitionBinding

Property	Default	Type	Required	Description
logical_operator	logical_operator	string ("AND", "OR")	No	The logical operator to use to combine the partition binding predicates provided
predicates	predicates	array[string]	No	The list of partition binding predicates to apply to the input component's data

RepartitionSpec

Specification for repartitioning operations on input component's data

Property	Default	Type	Required	Description
repartition			No	Options for repartitioning the input component's data.

RepartitionOptions

Options for repartitioning the input component's data.

Property	Default	Type	Required	Description
partition_by		string	Yes	The column to partition by.
granularity		string	Yes	The granularity to use for the partitioning.

TableWithDatasetOptions

Options for reading from a specific table in a dataset.

Property	Default	Type	Required	Description
name		string	Yes	Name of the table to be read.
dataset		string	No	Dataset of the table, specific to platforms like BigQuery.

TimePartitionSelection

Options for selecting partitions based on time.

Property	Default	Type	Required	Description
time			Yes	Filter to apply to the partitions.

TimePartitionFilters

Property	Default	Type	Required	Description
include		array[None]	No	Include partitions that match the filter.
exclude		array[None]	No	Exclude partitions that match the filter.

TimePartitionFilter

Property	Type	Required	Description
after	string	No	Match partitions whose partition value is after this date.
before	string	No	Match partitions whose partition value is before this date.
on_or_after	string	No	Match partitions whose partition value is on or after this date.
on_or_before	string	No	Match partitions whose partition value is on or before this date.
since		No	Match partitions with partition values that fall within the specified time interval up to the current moment.
eq	Any of: string integer	No	Match partitions whose partition ID matches this value. Examples for BigQuery: - Year partition: 2001 - Month partition: 200102 - Day partition: 20010201 - Hour partition: 2001020101

TimeDelta

Property	Type	Required	Description
seconds	integer	No	The number of seconds.
minutes	integer	No	The number of minutes.
hours	integer	No	The number of hours.
days	integer	No	The number of days.
weeks	integer	No	The number of weeks.
months	integer	No	The number of months.
years	integer	No	The number of years.

Examples​

BigQueryReadComponent​

Property Details​

Component​

ReadComponent​

MultipleTablesWithDataset​

ComponentColumn​

DatabaseReadOptions​

IncrementalReadStrategy​

CdcReplication​

CdcOptions​

IncrementalReplication​

IncrementalColumn​

MultipleQueries​

ParallelReadOptions​

SingleQuery​

PartitionedStrategy​

PartitionedOptions​

SCDType2Strategy​

SingleTableWithDataset​

PartitionedTableWithDatasetOptions​

InputComponent​

MergeStrategy​

KeyOptions​

PartitionBinding​

RepartitionSpec​

RepartitionOptions​

TableWithDatasetOptions​

TimePartitionSelection​

TimePartitionFilters​

TimePartitionFilter​

TimeDelta​