Azure Databricks workspace
In this guide, you will create an Azure Databricks workspace with Unity Catalog enabled for use with Ascend.
Prerequisites
To complete this how-to guide you need:
- An Azure account
- The ability to create Azure resources (may require elevated permissions for some steps)
- The ability to create Azure Databricks resources (may require elevated permissions for some steps)
- A terminal with:
- The Azure CLI installed (
brew install azure-cli
using Homebrew) - The Databricks CLI installed (
brew tap databricks/tap && brew install databricks
using Homebrew) - jq installed (
brew install jq
using Homebrew)
- The Azure CLI installed (
You can use the GUIs for the setup below, though we recommend using the CLIs for automation and repeatability.
Create an Azure Databricks workspace
If you already have an Azure Databricks workspace, you should still set the environment variables below pointing to your existing Azure resource group and Azure Databricks workspace. Skip the workspace creation step but still set the DATABRICKS_WORKSPACE_URL
variable and continue with the rest of the guide.
Set up variables
For better organization, we recommend appending the Azure region (e.g., eastus
, westus2
) to your resource names.
LOCATION="westus2"
RESOURCE_GROUP="ascend-dbx-$LOCATION"
DATABRICKS_WORKSPACE="ascend-dbx-$LOCATION"
Create an Azure resource group
Create an Azure resource group:
This command is idempotent, so you can run it without issue even if the resource group already exists.
az group create --name $RESOURCE_GROUP --location $LOCATION
Create an Azure Databricks workspace
Create the Azure Databricks workspace:
The az databricks workspace create
command is not idempotent, unlike most az * create
commands. Skip the workspace creation command below if you have an existing Databricks workspace with the same name in $RESOURCE_GROUP
. Ensure its location matches $LOCATION
.
You can check for a conflicting Azure Databricks workspace in your resource group with:
az databricks workspace list \
--resource-group $RESOURCE_GROUP \
--query "[].{name:name,location:location,resourceGroup:resourceGroup}" \
-o table
az databricks workspace create \
--name $DATABRICKS_WORKSPACE \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--sku premium
Set $DATABRICKS_WORKSPACE_URL
even if you are using an existing Databricks workspace.
DATABRICKS_WORKSPACE_URL="https://$(az databricks workspace show \
--name $DATABRICKS_WORKSPACE \
--resource-group $RESOURCE_GROUP \
--query "workspaceUrl" -o tsv)"
echo $DATABRICKS_WORKSPACE_URL
Open the Databricks workspace in your browser:
open $DATABRICKS_WORKSPACE_URL
Set up the Databricks CLI
Follow these steps in the Databricks UI to create a Personal Access Token (PAT) for use in the CLI.
Configure the CLI and enter the PAT you just created when prompted:
databricks configure --host $DATABRICKS_WORKSPACE_URL
The Databricks CLI uses profiles to work with multiple workspaces.
Set up the Databricks Unity Catalog metastore
You can only have one Databricks metastore per Azure region. If one already exists in your region, you must use it. You can list all the Databricks metastores with:
This command may only show metastores in the region of the Databricks workspace you are currently using.
databricks metastores list
And check for a Databricks metastore in the your Databricks workspace's region:
databricks metastores list -o json \
| jq --arg location $LOCATION \
'.[] | select(.region == $location)'
If your Azure Databricks workspace already has an Azure Databricks Unity Catalog metastore assigned, you're done!
- Use an existing Databricks Unity Catalog metastore
- Create a new Databricks Unity Catalog metastore
If there is already a Databricks metastore in your region, you should get the metastore ID:
METASTORE_ID=$(databricks metastores list -o json \
| jq -r --arg location $LOCATION \
'.[] | select(.region == $location) |.metastore_id')
echo $METASTORE_ID
Double-check that the metastore ID is set:
[ -n "$METASTORE_ID" ] && \
echo "METASTORE_ID is set and not empty, don't create a new one" || \
echo "METASTORE_ID is not set or empty, create one"
This section walks through the steps to create a new Databricks metastore with reasonable defaults. Adjust as needed for your organization's requirements.
Start by setting variables for creating (or using an existing) Azure Storage account and container for the Azure Databricks Unity Catalog metastore. If you are creating a new storage account, update your the prefix to use for the Azure Storage account:
STORAGE_ACCOUNT_PREFIX="ascenddbx"
STORAGE_ACCOUNT="$STORAGE_ACCOUNT_PREFIX$LOCATION"
STORAGE_CONTAINER="metastore"
DATABRICKS_ACCESS_CONNECTOR="ascend-dbx-connector-$LOCATION"
Azure storage account names must be globally unique (among other naming constraints). You should update the STORAGE_ACCOUNT_PREFIX
variable above to identify your organization to avoid conflicts with others following this guide.
Then create the Azure storage account for use by the Databricks metastore:
az storage account create \
--name $STORAGE_ACCOUNT \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--sku Standard_LRS \
--kind StorageV2 \
--enable-hierarchical-namespace true
And a container for the root of the Databricks metastore:
az storage container create \
--account-name $STORAGE_ACCOUNT \
--name $STORAGE_CONTAINER
Create the Azure Databricks access connector:
ACCESS_CONNECTOR_APP_ID=$(az databricks access-connector create \
--name $DATABRICKS_ACCESS_CONNECTOR \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--identity-type SystemAssigned \
--query "identity.principalId" -o tsv)
echo $ACCESS_CONNECTOR_APP_ID
Get the storage account URI (resource ID):
STORAGE_ACCOUNT_URI=$(az storage account show \
--name $STORAGE_ACCOUNT \
--resource-group $RESOURCE_GROUP \
--query "id" -o tsv)
echo $STORAGE_ACCOUNT_URI
and the resource group URI (resource ID):
RESOURCE_GROUP_URI=$(az group show \
--name $RESOURCE_GROUP \
--query "id" -o tsv)
echo $RESOURCE_GROUP_URI
Give the Azure Databricks access connector permissions to the Azure storage account for blob data:
az role assignment create \
--assignee-object-id $ACCESS_CONNECTOR_APP_ID \
--role "Storage Blob Data Contributor" \
--scope $STORAGE_ACCOUNT_URI
And queue data:
az role assignment create \
--assignee-object-id $ACCESS_CONNECTOR_APP_ID \
--role "Storage Queue Data Contributor" \
--scope $STORAGE_ACCOUNT_URI
Give the Azure Databricks access connector permissions to the Azure resource group for event subscriptions:
az role assignment create \
--assignee-object-id $ACCESS_CONNECTOR_APP_ID \
--role "EventGrid EventSubscription Contributor" \
--scope $RESOURCE_GROUP_URI
Despite being recommended by Azure Databricks, the Azure Databricks REST API (and thus CLI) does not support specifying the resource ID of the Azure Databricks access connector on creation as of API version 2.1.
You need to set the resource ID of the Azure Databricks access connector in the GUI.
Follow the steps to create a new metastore in the Databricks UI using the STORAGE_ROOT
("ADLS Gen2 path" in the GUI) and ACCESS_CONNECTOR_URI
("Access Connector Id" in the GUI) from below.
DATABRICKS_METASTORE="ascend-dbx-metastore-$LOCATION"
echo $DATABRICKS_METASTORE
echo $LOCATION
STORAGE_ROOT="$STORAGE_CONTAINER@$STORAGE_ACCOUNT.dfs.core.windows.net/"
echo $STORAGE_ROOT
ACCESS_CONNECTOR_URI=$(az databricks access-connector show \
--name $DATABRICKS_ACCESS_CONNECTOR \
--resource-group $RESOURCE_GROUP \
--query "id" -o tsv)
echo $ACCESS_CONNECTOR_URI
Assign the Databricks Unity Catalog metastore to the Databricks workspace
If you already assigned the Databricks metastore to the Databricks workspace in the Databricks UI after creating it, you're done!
DATABRICKS_DEFAULT_CATALOG="ascend_dbx"
METASTORE_ID=$(databricks metastores list -o json \
| jq -r --arg location $LOCATION \
'.[] | select(.region == $location) |.metastore_id')
echo $METASTORE_ID
WORKSPACE_ID=$(az databricks workspace show \
--name $DATABRICKS_WORKSPACE \
--resource-group $RESOURCE_GROUP \
--query "workspaceId" -o tsv)
echo $WORKSPACE_ID
databricks metastores assign $WORKSPACE_ID $METASTORE_ID $DATABRICKS_DEFAULT_CATALOG
Check the Databricks Unity Catalog metastore is assigned to the Databricks workspace
Check the current metastore is set to the one you just assigned:
databricks metastores current
Unity Catalog is enabled on your workspace!
Refresh the Databricks UI for the Databricks workspace in your browser after Unity Catalog setup.
Useful links
- Create a Unity Catalog metastore - Azure Databricks
- Authorize unattended access to Azure Databricks resources with a service principal using OAuth - Azure Databricks
- Create an external location to connect cloud storage to Azure Databricks - Azure Databricks
- Set a managed storage location for a catalog - Azure Databricks