Expose source URI and timestamp metadata in Read Components
This guide shows you how to include additional metadata columns in your file system Read Components.
You can expose the Ascend source URI (along with any parsed dates from the URI) and ingest timestamp to use for additional processing. These columns are hidden by default but may be helpful for downstream Transforms or other logic.
Example​
warning
Data types may differ across Data Planes. The example documented here uses BigQuery syntax with a Google Cloud Storage (GCS) bucket. Please update your syntax as needed.
The example below shows how to add the metadata columns using the _ascend_source
and _ascend_ingested_at
fields in YAML:
component:
read:
connection: read_gcs_lake
gcs:
path: ottos-expeditions/lakev0/generated/events/feedback_ascenders.parquet/year=
include:
- glob: "*/month=*/day=*/*.parquet"
columns:
- "*" # Load all columns
- _ascend_source:
cast: string
as: raw_uri
- _ascend_source:
cast: DATETIME
as: parsed_date_from_raw_uri
load_expr: >
DATETIME(CAST(REGEXP_EXTRACT(column, r'year=(\d{4})') AS INT),
CAST(REGEXP_EXTRACT(column, r'month=(\d{1,2})') AS INT),
CAST(REGEXP_EXTRACT(column, r'day=(\d{1,2})') AS INT),
0, 0, 0)
- _ascend_ingested_at:
cast: datetime
as: ingested_at
This example demonstrates the following steps:
- Specify a file system Connection with
connection: read_gcs_lake
- Define the base path in GCS with
path: ottos-expeditions/lakev0/generated/events/feedback_ascenders.parquet/year=
- Include specific files using glob patterns with
include: - glob: "*/month=*/day=*/*.parquet"
- Load all existing data columns with
- "*"
- Add the source URI as a string column named
raw_uri
using_ascend_source
- Extract date components from the URI path using regular expressions
- Parse these components into a DATETIME column named
parsed_date_from_raw_uri
- Include the Ascend ingestion timestamp as a datetime column named
ingested_at