Skip to main content
Version: 3.0.0

Azure Databricks workspace

In this guide, you will create an Azure Databricks workspace with Unity Catalog enabled for use with Ascend.

Prerequisites

To complete this how-to guide you need:

  • An Azure account
    • The ability to create Azure resources (may require elevated permissions for some steps)
    • The ability to create Azure Databricks resources (may require elevated permissions for some steps)
  • A terminal with:
important

You can use the GUIs for the setup below, though we recommend using the CLIs for automation and repeatability.

Create an Azure Databricks workspace

If you already have an Azure Databricks workspace, you should still set the environment variables below pointing to your existing Azure resource group and Azure Databricks workspace. Skip the workspace creation step but still set the DATABRICKS_WORKSPACE_URL variable and continue with the rest of the guide.

Set up variables

tip

For better organization, we recommend appending the Azure region (e.g., eastus, westus2) to your resource names.

LOCATION="westus2"
RESOURCE_GROUP="ascend-dbx-$LOCATION"
DATABRICKS_WORKSPACE="ascend-dbx-$LOCATION"

Create an Azure resource group

Create an Azure resource group:

tip

This command is idempotent, so you can run it without issue even if the resource group already exists.

az group create --name $RESOURCE_GROUP --location $LOCATION

Create an Azure Databricks workspace

Create the Azure Databricks workspace:

danger

The az databricks workspace create command is not idempotent, unlike most az * create commands. Skip the workspace creation command below if you have an existing Databricks workspace with the same name in $RESOURCE_GROUP. Ensure its location matches $LOCATION.

You can check for a conflicting Azure Databricks workspace in your resource group with:

az databricks workspace list \
--resource-group $RESOURCE_GROUP \
--query "[].{name:name,location:location,resourceGroup:resourceGroup}" \
-o table
az databricks workspace create \
--name $DATABRICKS_WORKSPACE \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--sku premium
danger

Set $DATABRICKS_WORKSPACE_URL even if you are using an existing Databricks workspace.

DATABRICKS_WORKSPACE_URL="https://$(az databricks workspace show \
--name $DATABRICKS_WORKSPACE \
--resource-group $RESOURCE_GROUP \
--query "workspaceUrl" -o tsv)"
echo $DATABRICKS_WORKSPACE_URL

Open the Databricks workspace in your browser:

open $DATABRICKS_WORKSPACE_URL

Set up the Databricks CLI

Follow these steps in the Databricks UI to create a Personal Access Token (PAT) for use in the CLI.

Configure the CLI and enter the PAT you just created when prompted:

databricks configure --host $DATABRICKS_WORKSPACE_URL
tip

The Databricks CLI uses profiles to work with multiple workspaces.

Set up the Databricks Unity Catalog metastore

You can only have one Databricks metastore per Azure region. If one already exists in your region, you must use it. You can list all the Databricks metastores with:

warning

This command may only show metastores in the region of the Databricks workspace you are currently using.

databricks metastores list

And check for a Databricks metastore in the your Databricks workspace's region:

databricks metastores list -o json \
| jq --arg location $LOCATION \
'.[] | select(.region == $location)'
tip

If your Azure Databricks workspace already has an Azure Databricks Unity Catalog metastore assigned, you're done!

If there is already a Databricks metastore in your region, you should get the metastore ID:

METASTORE_ID=$(databricks metastores list -o json \
| jq -r --arg location $LOCATION \
'.[] | select(.region == $location) |.metastore_id')
echo $METASTORE_ID

Double-check that the metastore ID is set:

[ -n "$METASTORE_ID" ] && \
echo "METASTORE_ID is set and not empty, don't create a new one" || \
echo "METASTORE_ID is not set or empty, create one"

Assign the Databricks Unity Catalog metastore to the Databricks workspace

tip

If you already assigned the Databricks metastore to the Databricks workspace in the Databricks UI after creating it, you're done!

DATABRICKS_DEFAULT_CATALOG="ascend_dbx"
METASTORE_ID=$(databricks metastores list -o json \
| jq -r --arg location $LOCATION \
'.[] | select(.region == $location) |.metastore_id')
echo $METASTORE_ID
WORKSPACE_ID=$(az databricks workspace show \
--name $DATABRICKS_WORKSPACE \
--resource-group $RESOURCE_GROUP \
--query "workspaceId" -o tsv)
echo $WORKSPACE_ID
databricks metastores assign $WORKSPACE_ID $METASTORE_ID $DATABRICKS_DEFAULT_CATALOG

Check the Databricks Unity Catalog metastore is assigned to the Databricks workspace

Check the current metastore is set to the one you just assigned:

databricks metastores current

Unity Catalog is enabled on your workspace!

tip

Refresh the Databricks UI for the Databricks workspace in your browser after Unity Catalog setup.