Skip to main content
Version: 3.0.0

Expose source URI and timestamp metadata in Read Components

This guide shows you how to include additional metadata columns in your file system Read Components.

You can expose the Ascend source URI (along with any parsed dates from the URI) and ingest timestamp to use for additional processing. These columns are hidden by default but may be helpful for downstream Transforms or other logic.

Example​

warning

Data types may differ across Data Planes. The example documented here uses BigQuery syntax with a Google Cloud Storage (GCS) bucket. Please update your syntax as needed.

The example below shows how to add the metadata columns using the _ascend_source and _ascend_ingested_at fields in YAML:

component:
read:
connection: read_gcs_lake
gcs:
path: ottos-expeditions/lakev0/generated/events/feedback_ascenders.parquet/year=
include:
- glob: "*/month=*/day=*/*.parquet"
columns:
- "*" # Load all columns
- _ascend_source:
cast: string
as: raw_uri
- _ascend_source:
cast: DATETIME
as: parsed_date_from_raw_uri
load_expr: >
DATETIME(CAST(REGEXP_EXTRACT(column, r'year=(\d{4})') AS INT),
CAST(REGEXP_EXTRACT(column, r'month=(\d{1,2})') AS INT),
CAST(REGEXP_EXTRACT(column, r'day=(\d{1,2})') AS INT),
0, 0, 0)
- _ascend_ingested_at:
cast: datetime
as: ingested_at

This example demonstrates the following steps:

  • Specify a file system Connection with connection: read_gcs_lake
  • Define the base path in GCS with path: ottos-expeditions/lakev0/generated/events/feedback_ascenders.parquet/year=
  • Include specific files using glob patterns with include: - glob: "*/month=*/day=*/*.parquet"
  • Load all existing data columns with - "*"
  • Add the source URI as a string column named raw_uri using _ascend_source
  • Extract date components from the URI path using regular expressions
  • Parse these components into a DATETIME column named parsed_date_from_raw_uri
  • Include the Ascend ingestion timestamp as a datetime column named ingested_at