GCS Read Component
Component for reading files from a Google Cloud Storage bucket.
Examples
- gcs_read_csv_config.yaml
- gcs_read_json_date_format.yaml
- gcs_read_parquet_shorthand.yaml
component:
read:
connection: my-gcs-connection
gcs:
path: /path/to/csv/files
include:
- suffix: .csv
parser:
csv:
has_header: true
component:
read:
connection: my-gcs-connection
gcs:
path: /path/to/json/files
include:
- suffix: .json
parser:
json:
date_format: "%Y-%m-%d"
timestamp_format: "%Y-%m-%dT%H:%M:%S"
component:
read:
connection: my-gcs-connection
gcs:
path: /path/to/parquet/files
parser: parquet
include:
- modified_at:
on_or_after: '2023-01-01T00:00:00Z'
before: '2025-01-01T00:00:00Z'
GcsReadComponent
GcsReadComponent is defined beneath the following ancestor nodes in the YAML structure:
Below are the properties for the GcsReadComponent. Each property links to the specific details section further down in this page.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| dependencies | array[None] | No | List of dependencies that must complete before this Component runs. | |
| event_time | string | No | Timestamp column in the Component output used to represent Event time. | |
| connection | string | No | Name of the Connection to use for reading data. | |
| columns | array[None] | No | List specifying the columns to read from the source and transformations to make during read. | |
| normalize | boolean | No | Boolean flag indicating whether the output column names should be normalized to a standard naming convention after reading. | |
| preserve_case | boolean | No | Boolean flag indicating whether the case of the column names should be preserved after reading. | |
| uppercase | boolean | No | Boolean flag indicating whether the column names should be transformed to uppercase after reading. | |
| strategy | PartitionedStrategy | No | Ingest strategy when reading files. | |
| gcs | Yes | Options for reading files from a Google Cloud Storage bucket. |
Property Details
Component
A Component is a fundamental building block of a data Flow. Supported Component types include: Read, Transform, Task, Test, and more.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| component | One of: CustomPythonReadComponent ApplicationComponent AliasedTableComponent ExternalTableComponent | Yes | Component configuration options. |
ReadComponent
Component that reads data from a system.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| data_plane | One of: SnowflakeDataPlane BigQueryDataPlane DuckdbDataPlane DatabricksDataPlane | No | Data Plane-specific configuration options for Components. | |
| skip | boolean | No | Boolean flag indicating whether to skip processing for the Component or not. | |
| retry_strategy | No | Retry strategy configuration options for the Component if any exceptions are encountered. | ||
| description | string | No | Brief description of what the model does. | |
| metadata | No | Meta information of a resource. In most cases it doesn't affect the system behavior but may be helpful to analyze project resources. | ||
| name | string | Yes | The name of the model | |
| flow_name | string | No | Name of the Flow that the Component belongs to. | |
| data_maintenance | No | The data maintenance configuration options for the Component. | ||
| tests | No | Defines tests to run on this Component's data. | ||
| read | One of: GenericFileReadComponent LocalFileReadComponent SFTPReadComponent S3ReadComponent GcsReadComponent AbfsReadComponent HttpReadComponent MSSQLReadComponent MySQLReadComponent OracleReadComponent PostgresReadComponent SnowflakeReadComponent BigQueryReadComponent DatabricksReadComponent | Yes | Read component that reads data from a system. |
FileReadOptionsBase
Options for locating and parsing files from a specified directory or file path, including file selection criteria and parser to use.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| path | string | Yes | Path to the directory or file to read. Path is relative to the connection's root directory, and cannot be an absolute path or traverse outside the root directory. | |
| exclude | array[None] | No | List of conditions to exclude specific files from being processed. | |
| include | array[None] | No | List of conditions to include specific files for processing. | |
| parser | auto | One of: Any of: auto Any of: avro Any of: feather Any of: orc Any of: parquet Any of: pickle Any of: text Any of: xml Any of: csv Any of: excel Any of: json | No | Parser Resource for reading the files. Defaults to 'auto'. To set specific parser options, use the parser name as a child object. |
| archive | Any of: | No | Configuration for archive files for processing. | |
| load_strategy | No | Strategy for loading files, including limits on number and size of files. | ||
| time_based_file_selection | Any of: last_modified | No | Method to use for file selection based on a time window. |
AutoParser
Parser that automatically detects the file format and settings for parsing.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| hive_partitioning | False | boolean | No | Whether to extract hive partitioning from the full path name of the file and use it as partition columns. If true, the partition columns will be added to the table as additional columns. The dataset should not contain any columns with the same name as the partition columns. If false, the partition columns will not be added to the table. If not set, the default value is false. e.g.: directory names with key=value pairs like “/year=2009/month=11”, when set to true, will be parsed into partition columns with the names "year" and "month" and the values "2009" and "11" respectively. |
| auto | Yes | Options for automatically detecting the file format. None need to be specified. |
AvroParser
Parser for the Avro file format.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| hive_partitioning | False | boolean | No | Whether to extract hive partitioning from the full path name of the file and use it as partition columns. If true, the partition columns will be added to the table as additional columns. The dataset should not contain any columns with the same name as the partition columns. If false, the partition columns will not be added to the table. If not set, the default value is false. e.g.: directory names with key=value pairs like “/year=2009/month=11”, when set to true, will be parsed into partition columns with the names "year" and "month" and the values "2009" and "11" respectively. |
| avro | Yes | Parsing options for Avro files. |
CsvParser
Parser for CSV files.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| hive_partitioning | False | boolean | No | Whether to extract hive partitioning from the full path name of the file and use it as partition columns. If true, the partition columns will be added to the table as additional columns. The dataset should not contain any columns with the same name as the partition columns. If false, the partition columns will not be added to the table. If not set, the default value is false. e.g.: directory names with key=value pairs like “/year=2009/month=11”, when set to true, will be parsed into partition columns with the names "year" and "month" and the values "2009" and "11" respectively. |
| csv | Yes | Parsing options for CSV files. |
CsvParserOptions
Parsing options for CSV files, including separators, header presence, and multi-line values.
| Property | Default | Type | Required | Description |
|---|---|---|---|---|
| all_varchar | boolean | No | Option to skip type detection for CSV parsing and assume all columns to be of type VARCHAR. | |
| allow_quoted_nulls | boolean | No | Option to allow the conversion of quoted values to NULL values. | |
| auto_type_candidates | array[string] | No | Types that the sniffer will use when detecting CSV column types. VARCHAR is always included as a fallback. | |
| buffer_size | integer | No | The buffer size in bytes for the CSV reader. | |
| columns | object with property values of type string | No | A struct specifying the column names and types. Using this option implies that auto detection is not used. | |
| compression | string | No | File compression type, detected automatically from the file extension. | |
| dateformat | string | No | Date format to use when parsing dates. | |
| decimal_separator | string | No | Decimal separator of numbers. | |
| delim | string | No | Delimiter of the CSV file. | |
| types | Any of: |