Set up Databricks with Ascend

Overview

In this quickstart guide, you'll learn how to use Databricks as your Data Plane in Ascend.

This guide will take you through the following steps:

Create a Databricks service principal
Create a Databricks warehouse
Set up Data Plane catalog and schema
Create Databricks compute
Securely store your Databricks secret
Configure your Ascend Workspace & Project with your Databricks details
Run an Ascend Flow and view the resulting tables in Databricks

Prerequisites

Ascend Instance
Databricks workspace with Unity Catalog enabled
- Azure Databricks workspace setup how-to guide
Databricks CLI installed on your terminal (brew tap databricks/tap && brew install databricks using Homebrew)
jq installed on your terminal (brew install jq using Homebrew)
Familiarity with the role of the Data Plane in Ascend

tip

While you can complete the setup steps below using the Databricks UI, we recommend using the CLI for automation and repeatability.

Set up the Databricks CLI

tip

If you've already followed an Ascend Databricks setup guide, your Databricks CLI is configured and you can skip this section. Just verify your default profile points to the correct Databricks workspace.

First, check if the Databricks CLI is configured:

DATABRICKS_WORKSPACE_URL=$(databricks auth env | jq -r .env.DATABRICKS_HOST)

if [[ "$DATABRICKS_WORKSPACE_URL" ]]; then
  echo "Using Databricks workspace:\n$DATABRICKS_WORKSPACE_URL"
fi

If the CLI isn't configured yet, set your Databricks workspace URL:

DATABRICKS_WORKSPACE_URL=<your-databricks-workspace-url>

Open your Databricks workspace:

open $DATABRICKS_WORKSPACE_URL

Create a Personal Access Token (PAT) in the Databricks UI, then configure the CLI:

databricks configure --host $DATABRICKS_WORKSPACE_URL

Enter your PAT when prompted.

tip

The Databricks CLI uses profiles to manage working with multiple Databricks workspaces.

Verify Unity Catalog metastore

Confirm your Databricks workspace has a Unity Catalog metastore assigned:

databricks metastores current

If you see a metastore_id, workspace_id, and default_catalog_name in the output, you're ready to proceed. If no metastore is assigned, follow one of our Databricks setup guides first.

Create service principal

Create a Databricks service principal for the Default Environment on your Ascend Instance:

ENV_DEFAULT_SP_APP_ID=$(databricks service-principals create \
    --display-name "ascend-env-default-sp" \
    | jq -r '.applicationId')
echo $ENV_DEFAULT_SP_APP_ID

Create SQL warehouse

Create a Databricks SQL warehouse for the Default Environment:

WH_ID_DEFAULT=$(databricks warehouses create \
    --cluster-size "2X-Small" \
    --auto-stop-mins 5 \
    --min-num-clusters 1 \
    --max-num-clusters 1 \
    --enable-photon \
    --enable-serverless-compute \
    --no-wait \
    --name "ASCEND_DEVELOPMENT_WAREHOUSE" \
    | jq -r '.id')
echo $WH_ID_DEFAULT

Set up Data Plane catalog and schema

Verify the current metastore:

SQL="select current_metastore()"
databricks api post /api/2.0/sql/statements --json \
    '{"warehouse_id": "'"$WH_ID_DEFAULT"'", "statement": "'"$SQL"'"}'

Define the catalog and schema that your Default Data Plane will use:

ASCEND_DEFAULT_CATALOG="ascend_data_plane_default"
ASCEND_DEFAULT_SCHEMA="default"

Create the Unity Catalog:

SQL="CREATE CATALOG IF NOT EXISTS $ASCEND_DEFAULT_CATALOG"
databricks api post /api/2.0/sql/statements --json \
    '{"warehouse_id": "'"$WH_ID_DEFAULT"'", "statement": "'"$SQL"'"}'

Create the schema within the catalog:

SQL="CREATE SCHEMA IF NOT EXISTS $ASCEND_DEFAULT_CATALOG.$ASCEND_DEFAULT_SCHEMA"
databricks api post /api/2.0/sql/statements --json \
    '{"warehouse_id": "'"$WH_ID_DEFAULT"'", "statement": "'"$SQL"'"}'

Grant the Default service principal access to the catalog:

SQL="GRANT ALL PRIVILEGES ON CATALOG $ASCEND_DEFAULT_CATALOG TO \`$ENV_DEFAULT_SP_APP_ID\`"
databricks api post /api/2.0/sql/statements --json \
    '{"warehouse_id": "'"$WH_ID_DEFAULT"'", "statement": "'"$SQL"'"}'

Grant the Default service principal access to the schema:

SQL="GRANT ALL PRIVILEGES ON SCHEMA $ASCEND_DEFAULT_CATALOG.$ASCEND_DEFAULT_SCHEMA TO \`$ENV_DEFAULT_SP_APP_ID\`"
databricks api post /api/2.0/sql/statements --json \
    '{"warehouse_id": "'"$WH_ID_DEFAULT"'", "statement": "'"$SQL"'"}'

Create compute

Now create a Databricks cluster for your Default Data Plane.

tip

For more customization options, consider creating the all-purpose cluster in the Databricks UI. You can then copy the JSON configuration for reuse in the CLI or automation scripts.

Select the appropriate node type for your Databricks cloud provider:

AWS
Azure
GCP

NODE_TYPE_ID="m5.large"

NODE_TYPE_ID="Standard_D4ds_v5"

NODE_TYPE_ID="n1-standard-4"

Create the cluster for the Ascend Default Environment:

CLUSTER_I_DEFAULT=$(databricks clusters create \
    --data-security-mode DATA_SECURITY_MODE_STANDARD \
    --autotermination-minutes 10 \
    --kind CLASSIC_PREVIEW \
    --node-type-id $NODE_TYPE_ID \
    --num-workers 1 \
    --timeout 60m0s \
    --no-wait \
    --cluster-name ascend-data-plane-default \
    --json '{
      "spark_version": "15.4.x-scala2.12",
      "spark_conf": {
        "spark.databricks.sql.initial.catalog.namespace": "'"$ASCEND_DEFAULT_CATALOG"'"
      }
    }' \
    | jq -r '.cluster_id')
echo $CLUSTER_I_DEFAULT

Grant service principals compute access

tip

If you don't have permission to manage compute access in your Databricks workspace, ask your Databricks administrator to grant the service principals access to these resources.

databricks permissions set sql/warehouses $WH_ID_DEFAULT \
    --json "{
        \"access_control_list\": [
            {
                \"service_principal_name\": \"$ENV_DEFAULT_SP_APP_ID\",
                \"permission_level\": \"CAN_USE\"
            }
        ]
    }"

DEFAULT_CLUSTER=$(databricks clusters list | grep "ASCEND_DEFAULTUCTION_CLUSTER" | awk '{print $1}')
databricks api patch /api/2.0/permissions/clusters/$DEFAULT_CLUSTER --json '{"access_control_list": [{"service_principal_name": "'"$ENV_DEFAULT_SP_APP_ID"'", "permission_level": "CAN_ATTACH_TO"}]}'

Create and store your Databricks secret

Securely store your Databricks credentials in Ascend's Environment Vault:

In the Databricks UI, go to Settings > Identity and access > Service principals
Select the ascend-env-default-sp service principal
Go to Secrets and click Generate secret
Choose an appropriate lifetime for your security requirements and click Generate
Copy the generated secret value
In your Ascend Instance, click your profile picture (top-right) and select Settings
Go to Secrets & Vaults
Select the Default Environment Vault
Click Add secret
Enter DATABRICKS_SECRET as the name and paste your secret in the Value field
Click Create

Configure your Workspace & Project

note

Alternatively, you can keep the out-of-the-box Project configuration and create a new Databricks Data Plane Connection instead. This allows you to use the existing setup and proceed directly to running the sales Flow.

Configure your Default Ascend Project to use the Databricks-specific template:

In the top-right corner of your Ascend Instance, click on your profile picture and select Settings
Go to Projects & Deployments and select the Default Project
Change the Project root to projects/default/databricks
Click Save to apply your changes
Navigate back to your Ascend Workspace via the homepage or with Cmd+K search navigation
Open the Files panel and locate ascend_project.yaml toward the bottom of the file tree.

Add your Databricks parameters to the configuration:

    databricks:
    workspace_url: <YOUR_WORKSPACE_URL>
    client_id: <YOUR_CLIENT_ID>
    cluster_id: <YOUR_CLUSTER_ID>
    cluster_http_path: <YOUR_CLUSTER_HTTP_PATH>
    warehouse_http_path: <YOUR_WAREHOUSE_HTTP_PATH>
    catalog: <YOUR_CATALOG>
    schema: DEFAULT

tip

If you used the CLI for setup, run these commands to retrieve all your configuration parameters at once:

Run commands

# Get workspace URL from auth configuration
WORKSPACE_URL=$(databricks auth env | grep DATABRICKS_HOST | cut -d'"' -f4)

# Set cluster ID if not already set
if [ -z "$CLUSTER_I_DEFAULT" ]; then
    CLUSTER_I_DEFAULT="1016-221218-k26fp8um"
fi

# Get cluster HTTP path
CLUSTER_INFO=$(databricks clusters get $CLUSTER_I_DEFAULT 2>/dev/null)

# Construct cluster HTTP path
CLUSTER_HTTP_PATH="sql/protocolv1/o${ORG_ID}/${CLUSTER_I_DEFAULT}"

# Get warehouse HTTP path
WAREHOUSE_INFO=$(databricks warehouses get $WH_ID_DEFAULT 2>/dev/null)
if [ $? -eq 0 ]; then
    WAREHOUSE_HTTP_PATH=$(echo "$WAREHOUSE_INFO" | jq -r '.odbc_params.path // "[Warehouse not ready]"')
else
    WAREHOUSE_HTTP_PATH="[Error retrieving warehouse info]"
fi

# Display all values
echo "\n\n---\n"
echo "databricks:"
echo "  workspace_url: \"$WORKSPACE_URL\""
echo "  client_id: $ENV_DEFAULT_SP_APP_ID"
echo "  cluster_id: $CLUSTER_I_DEFAULT"
echo "  cluster_http_path: $CLUSTER_HTTP_PATH"
echo "  warehouse_http_path: $WAREHOUSE_HTTP_PATH"
echo "  catalog: $ASCEND_DEFAULT_CATALOG"
echo "  schema: $ASCEND_DEFAULT_SCHEMA"
echo "\n---"

Paste these parameters into ascend_project.yaml and use Cmd + S / Ctrl + S to save your changes.

Run the `sales` Flow

In the Super Graph view of your Ascend Workspace, double-click the sales Flow to open the Flow Graph view, which shows all Components in the sales Flow
In the build info panel on the top left, click Run Flow and watch all Components in your Flow execute from left to right

View Ascend tables in Databricks

Open the Databricks console and click Catalog in the left panel
Expand the ascend_data_plane_default catalog
Expand the default schema
Select the read_sales_website table
Click the Sample Data tab to view your data

note

If you're logged in with a personal Databricks account, you may need to grant yourself access to view the data.

Grant personal access

USER_EMAIL="YOUR_USER_EMAIL"

SQL="GRANT SELECT ON CATALOG ascend_data_plane_default TO \`${USER_EMAIL}\`"
databricks api post /api/2.0/sql/statements --json \
    '{"warehouse_id": "'"$WH_ID_DEFAULT"'", "statement": "'"$SQL"'"}'

This demonstrates Data Plane persistence in action! With this setup, you can push your compute and storage down to the Databricks Data Plane for efficient data processing.

🎉 Congratulations! You've successfully configured Databricks as your Data Plane and run your first Flow!

Next steps

👩🏻‍💻 Follow the developer quickstart to build your own Flow in Ascend

Overview​

Prerequisites​

Set up the Databricks CLI​

Verify Unity Catalog metastore​

Create service principal​

Create SQL warehouse​

Set up Data Plane catalog and schema​

Create compute​

Grant service principals compute access​

Create and store your Databricks secret​

Configure your Workspace & Project​

Run the sales Flow​

View Ascend tables in Databricks​

Next steps​