Skip to main content

Self-contained DuckDB scripts

· 3 min read
Cody
Product @ Ascend

Reproducible and easy-to-share SQL scripts for DuckDB.

Self-contained Python scripts are increasingly common via uv (making them fast & convenient) -- we use them extensively at Ascend. They start like:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "typer",
# ]
# ///

You can also do this with DuckDB (and other) SQL scripts. This is useful for sharing reproductions:

#!/usr/bin/env -S duckdb -bail -init

INSTALL ducklake;
ATTACH 'ducklake:/tmp/ducklake_repro.ducklake' AS lake (DATA_PATH '/tmp/ducklake_repro_data/');
USE lake;

DROP TABLE IF EXISTS t;
CREATE TABLE t (id INT, val INT);
INSERT INTO t VALUES (1, 100);

-- Source has duplicate id=1 - triggers the error
UPDATE t SET val = src.val
FROM (VALUES (1, 200), (1, 300)) AS src(id, val)
WHERE t.id = src.id;

You can run the script with ./repro.sql (or whatever filename you use). Note you need to chmod +x $SCRIPT or otherwise make it executable in any case.

Using variables

When we were learning & evaluating DuckLake for our own use cases, we created shareable dev.sql and dev.py scripts. In Python, it's trivial to create variables and override them upon script invocation. In DuckDB SQL less-so, so I instead wrote a Bash script that calls DuckDB:

#!/usr/bin/env bash

set -euo pipefail

# use the LAKE env var if set, otherwise default to ~/lake
if [ -z "${LAKE:-}" ]; then
LAKE="$HOME/lake"
mkdir -p "$LAKE"
fi

# use a here-doc to pass SQL to duckdb with the LAKE variable
duckdb -cmd "$(cat <<EOF
install ducklake;
install sqlite;

create secret (
type ducklake,
metadata_path 'sqlite:$LAKE/metadata.sqlite',
data_path '$LAKE/data'
);

attach 'sqlite:$LAKE/metadata.sqlite' as metadata;
attach 'ducklake:' as data;

use data;

EOF
)"

The file was just named dev.sql, but it would allow for a configurable datalake location. You can use this is you need to use arguments and/or environment variables to change the behavior of your SQL script. And if you're curious, dev.py launches an IPython session with an Ibis connection to the same DuckDB with DuckLake data:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = "==3.13"
# dependencies = [
# "ibis-framework[bigquery,duckdb,postgres,sqlite]",
# "ipython",
# "pandas",
# "plotly",
# "polars",
# "rich",
# "typer",
# ]
#
# ///

# imports
import os
from importlib import reload # noqa

import ibis
import IPython
from rich import console

# Ibis configuration
ibis.options.interactive = True
ibis.options.repr.interactive.max_rows = 40
ibis.options.repr.interactive.max_columns = None

# rich configuration
console = console.Console()
print = console.print

# path configuration
lakedir = os.getenv("LAKE", os.path.join(os.path.expanduser("~"), "lake"))
metadata_db = os.path.join(lakedir, "metadata.sqlite")
data_dir = os.path.join(lakedir, "data")
os.makedirs(data_dir, exist_ok=True)

# print banner
banner = """
▓█████▄ ▓█████ ██▒ █▓
▒██▀ ██▌▓█ ▀▓██░ █▒
░██ █▌▒███ ▓██ █▒░
░▓█▄ ▌▒▓█ ▄ ▒██ █░░
░▒████▓ ░▒████▒ ▒▀█░
▒▒▓ ▒ ░░ ▒░ ░ ░ ▐░
░ ▒ ▒ ░ ░ ░ ░ ░░
░ ░ ░ ░ ░░
░ ░ ░ ░
░ ░
""".strip()
console.print(banner, style="bold purple")

# create Ibis connections
metacon = ibis.sqlite.connect(metadata_db)
con = ibis.duckdb.connect()
init_sql = f"""
install ducklake;
install sqlite;

create secret (
type ducklake,
metadata_path 'sqlite:{metadata_db}',
data_path '{data_dir}'
);

attach 'sqlite:{metadata_db}' as metadata;
attach 'ducklake:' as data;

use data;
""".strip()
con.raw_sql(init_sql)
ibis.set_backend(con)

# create IPython shell
IPython.embed(
banner1="",
banner2="",
display_banner=False,
exit_msg="",
colors="linux",
theming="monokai",
)