Best practices for BigQuery
This article extends our recommended best practices for BigQuery.
GCP project management​
We recommend using separate Google Cloud Platform (GCP) GCP projects for each Ascend Environment. For instance:
ottos-expeditions-development
ottos-expeditions-staging
ottos-expeditions-production
are the GCP projects we use for Otto's Expeditions.
BigQuery dataset management​
We recommend separate BigQuery datasets for each Ascend Deployment and Workspace. For instance, in Otto's Expeditions we set the following Project-level Parameters:
parameters:
data_planes:
gcp:
project_id: <your-gcp-project-id>
bigquery:
dataset: OTTOS_EXPEDITIONS_DEVELOPMENT
location: US
Then we have a workspace_template.yaml
Profile:
profile:
parameters:
gcp: $parameters.data_planes.gcp
bigquery:
$<: $parameters.data_planes.bigquery
dataset: OTTOS_EXPEDITIONS_DEVELOPMENT
Each Deployment Profile will similarly override Parameters to isolate its resources.
We further override the BigQuery dataset for each Flow in the [flow_name].yaml
(transform.yaml
in this example) file:
flow:
version: 0.1.0
parameters:
bigquery:
dataset: ${parameters.bigquery.dataset}_TRANSFORM
Need help with YAML syntax? Ask Otto!
This results in datasets isolated across GCP projects for each Ascend Environment. Datasets are additionally namespaced by the Profile (a Deployment or a user's Workspace) and Flow for better organization.
Ascend Instance Store​
We recommend using a separate dataset named ASCEND_INSTANCE
for the Ascend Instance Store in your Production GCP project. You may also separate this into its own GCP project if you prefer. The Instance's GCP identity should have the same access control as each Ascend Environment for its corresponding GCP project.
Compute resources​
We recommend implementing BigQuery slot reservations (not commitments) to manage compute resources.
Name these reservations following the pattern ASCEND_[ENVIRONMENT]_RESERVATION
:
ASCEND_DEVELOPMENT_RESERVATION
ASCEND_STAGING_RESERVATION
ASCEND_PRODUCTION_RESERVATION
This approach provides effective cost management while maintaining performance across your different environments.
Access control (IAM)​
Each Ascend Environment needs the following IAM roles for its corresponding GCP project:
bigquery.jobUser
bigquery.readSessionUser
bigquery.dataEditor
You can find the Environment identity details in the settings page of your Ascend Environment.