Skip to main content

Your First Agentic Pipeline — Guided Lab

This is a guided lab for building an agentic data pipeline with Otto on Ascend — configure context, tools, and triggers (CTT), then build and automate a real pipeline. You've mapped the lifecycle, traced the failure modes, and read the theory. The earlier modules described context, tools, and triggers as the three design decisions at the heart of agentic systems. This lab makes that concrete. You're going to configure each one yourself — not as a configuration exercise, but because each one changes what Otto can do and how reliably it does it.

Enough theory. Let's build something.

Terms in this lab

In this lab, agentic means you direct the work — you specify outcomes, configure the system, and verify results — while Otto plans, builds, and runs the pipeline. If you're new to the concept, What is Agentic Data Engineering? covers it from the ground up.

By the end of this lab, you will be able to:

  • Orient yourself to the Ascend platform and the otto/ project directory
  • Write a rules file that gives Otto persistent, domain-specific context for the Expeditions dataset
  • Identify the Context, Tools, and Triggers (CTT) active during each phase of the build
  • Write an outcome-specified prompt using clear intent (output shape, constraints, scope, and a pre-build check)
  • See the observe-decide-act loop in a live build during Phase 5 and explain what Otto is doing at each step
  • Schedule the pipeline with a run_flow automation and add a FlowRunFailure failure automation that emails you an Otto-generated diagnostic summary
Prerequisites

No prior agent-building experience required. This lab is Module 7 in ADE Foundations (101) — completing Modules 1–6 first is recommended but not required.

About the tools: This lab uses Ascend because it provides agentic data engineering infrastructure — data warehouse, orchestration, and an AI agent — in a single workspace, with no setup required beyond signing up. The patterns you'll practice (writing rules, specifying outcomes, building automations) apply to any agent-enabled data environment.

Phase 1: Setup

Step 1: Sign up for a free trial

tip

The Ascend free trial gives you 14 days or 100 Ascend Credits of access, whichever comes first. Check Getting Started for up-to-date limits and what happens when credits run out.

Go to app.ascend.io and click Start your free trial.

Ascend app.ascend.io landing page with Start Your Free Trial button

Sign up with Google, LinkedIn, Microsoft, or GitHub, or use an email address and password.

Ascend signup form showing Google, LinkedIn, Microsoft, GitHub, and email options

If you sign up with an email address and password, check your email for a verification message from support@ascend.io.

Step 2: Complete onboarding

Once signed in, you'll see the onboarding screen. When asked how you'd like to get started, select Start with Otto's Expeditions.

Onboarding — select Otto's Expeditions

Otto will provision your workspace — your project container in Ascend where your data and pipelines live — pre-loaded with the Expeditions data and flows (Ascend's term for a data pipeline). This usually takes under a minute or two.

Already have an Ascend account?

If you've already started a trial, skip to Step 3 and navigate to your existing workspace.

If you're not using Ascend

This lab uses Ascend because it provides agentic data engineering infrastructure in a single workspace. The patterns in each phase transfer to any agent-enabled environment:

  • Write a rule — add a conventions or business rules file that the agent reads before acting. In a dbt + large language model (LLM) setup, this is a conventions.md included in every context window.
  • Specify outcomes, not steps — applicable to any LLM-based tool. Define what "done" looks like; let the agent determine implementation.
  • Automate with triggers — wherever your orchestration layer supports event-driven runs, a cron schedule plus a failure webhook covers most production use cases.

Step 3: Configure Otto

Open Otto with Ctrl+I (Windows/Linux) or Cmd+I (Mac).

At the bottom of the Otto panel, click + Runtime and select your workspace. A Runtime is the compute environment that executes your pipelines — connecting it gives Otto access to your actual schema and data.

Enable web browsing and auto-run queries in the Otto panel so Otto can look up documentation and execute queries without manual approval at each step.

Web browsing and auto-run queries enabled in the Otto panel

What you just configured: Context, Tools, and Triggers

Connecting a Runtime and enabling web browsing and auto-run queries are tool decisions — not just setup steps. From Context, Tools, and Triggers:

  • Runtime gives Otto access to your workspace's real schema, lineage graph, and file structure. It's what lets Otto read your actual data model instead of guessing from training data — the difference between an agent working from real context and one working from a best guess.
  • Web browsing and auto-run queries are tool calls — permissions you're granting Otto to act. Web browsing lets Otto fetch live documentation and API specs; auto-run queries lets Otto execute SQL against your warehouse without pausing for manual approval at each step.

Tool scope is also a risk decision. In a production system, you'd scope tool access to the minimum necessary for the task.

Phase 2: Tour the platform

Before building anything, take 5 minutes to orient yourself. You'll use these areas throughout the lab.

The Workspace and Super Graph

Your workspace is the central hub for building and managing data pipelines. When you land after onboarding, you'll see the Super Graph — a high-level view of all flows in your workspace.

Super Graph

Double-click any flow to drill into its Flow Graph, which shows all the components inside it:

  • Read Components — ingest data from external sources
  • Transforms — process and transform data using SQL or Python
  • Write Components — send data to external destinations
  • Task Components — run custom code using Python or SQL

Flow Graph

Double-click any component to inspect its code, run history, and output data.

Code view showing a component's SQL and run history

The Toolbar

The toolbar on the right side of your screen gives you quick access to:

  • Resources — browse all datasets and components in your workspace
  • Files — view and edit project files, including your otto/ directory
  • Worksheets — run ad-hoc SQL queries against your data
  • Source Control — review diffs, commit changes, and manage branches

The otto/ directory

Click Files in the toolbar and open the otto/ folder. This is where you configure Otto's behavior for this project.

Otto directory

Here's the structure:

otto/
├── rules/ ← context files that shape Otto's behavior (you'll create one in Phase 4)
├── agents/ ← custom agents (not needed for this lab; covered in ADE 201)
├── commands/ ← reusable prompt templates (not needed for this lab; covered in ADE 201)
└── mcp.yaml ← external tool connections via Model Context Protocol

The agents/ and commands/ folders are used in ADE 201.

  • Ctrl+I (or Cmd+I on Mac) — open Otto from anywhere
  • Ctrl+K (or Cmd+K on Mac) — search and navigate anywhere in the platform

You're oriented. Now let's understand the data.

Phase 3: Explore — understand the data before building

Otto's Expeditions project is a sample dataset modeling a fictional outdoor expeditions business: trips, guides, customers, bookings, and revenue. Before you build anything, explore what's there. Understanding the data before you design a pipeline is a habit worth keeping regardless of the tool you use.

Open Otto and try these prompts one at a time, reading the responses before moving on:

What tables are in the Expeditions project? Give me a brief description
of what each one represents.

Read the response. Then:

Show me the schema for the bookings table. What are the key fields,
and what are the data types?

And then:

How are the tables related to each other? Walk me through the
join relationships I'd need to produce a customer-level summary
of bookings and revenue.
What to notice

When Otto has Runtime access to your workspace, it uses tools to read your actual schema, metadata, and lineage — rather than relying solely on training data. If you compare the schema details it returns to what's in your workspace, they should closely align — spot-check any discrepancies in the UI or metadata view. Responses may still vary based on tool access, permissions, and model behavior.

Pay attention to how Otto describes the bookings table status field and what it counts as revenue. You'll be encoding your own definition in the next phase — notice whether Otto's default interpretation matches what you'd want.

Take a moment to orient yourself in the data before moving on. Understanding what you're working with is the first context engineering (deliberately choosing what information the agent sees) decision.

Phase 4: Context — write a rule

You just explored the Expeditions data. Now encode what you learned as durable context — so Otto uses your business definitions every time, not its own.

What rules are

Rules are markdown files stored in otto/rules/ that get injected into Otto's context when a condition matches:

  • Always-on (alwaysApply: true) — fires in every conversation in this project
  • Glob-scoped — fires when Otto is working with matching file types (e.g., *.sql, *.py)
  • Keyword-scoped — fires when a specific phrase appears in your prompt

They're how you encode your team's domain knowledge, business definitions, and standards as durable context — rather than re-explaining them in every prompt. In the Context, Tools, and Triggers framework, rules files are the primary mechanism for engineering your context stack.

Write a code standards file for Expeditions

Open a new Otto thread and paste this prompt:

Create an always-on rule at otto/rules/expeditions-code-standards.md.
Define these coding standards for all SQL in this project:

1. Use snake_case for all column names and model names.
2. Staging models are prefixed with stg_ and transformation models with tr_.
3. Every model must include a comment block at the top describing what it produces and the grain (what one row represents).
4. Filter out cancelled and refunded bookings before any revenue calculation.
5. Always alias aggregated columns explicitly — no unnamed expressions.

Keep the rule short and specific.

What to look for

After Otto creates the file, open it in the Files panel — click the Files icon in the toolbar, then navigate to otto/rules/. The file opens with a YAML frontmatter block (a section delimited by --- markers at the top of the file) containing:

otto:
rule:
alwaysApply: true
description: "SQL coding standards for the Expeditions project"

The alwaysApply: true means this rule fires on every conversation in this project — always in Otto's context stack. In a more complex project, you'd use globs or keywords to scope rules, so detailed guidance only loads when it's relevant and unrelated context stays out of the window.

Why this changes what Otto builds

Without this rule, Otto makes its own choices about column naming, model prefixes, and comment style. With it, every model Otto generates follows your team's conventions. When you audit or hand off the code, it reads like it was written by a single author — because the standards were consistent from the start. That auditability is part of what makes agentic output governable.

Phase 5: Build — specify outcomes, plan, then execute

The rule is in place. Now build the pipeline. The pattern here is: specify what you want → ask Otto to plan how — then review before any code gets written. Planning catches misunderstandings when they're still sentences, not SQL.

Specify the outcome and request a plan

Open a new Otto thread and paste this prompt:

I want to build a new flow that shows monthly revenue trends by channel
over the available data range.

The output should have one row per channel per month, with columns for:
channel, transaction count, distinct customer count, average transaction
value, and total revenue.

Before writing any code, create a plan at plan/lab-101.md that covers:
1. Which tables you'll read from and what each one represents
2. What transforms or aggregations are needed to produce this output
3. What assumptions you're making about the data that could affect the result

Don't write any SQL yet. I'll review the plan before you start building.

Read the plan before continuing. Check:

  • Does it name the actual source tables from this project (not generic examples)?
  • Does it state the output grain explicitly — one row per what?
  • Does the assumptions list name specific things — not just "I assumed the data is clean"?

When the plan is grounded and specific, tell Otto to proceed — then watch what happens.

The observe-decide-act loop in action

While Otto builds, watch what happens when it hits an error — because it will. It reads the error output (observe), picks a different approach (decide), runs the component again (act), and checks whether it worked. That cycle is the core loop from How Agents Work: a for loop with an LLM call inside it, running until the task resolves or a stopping condition fires.

This isn't intelligence — it's structure. The result of each tool call gets injected back as context for the next prompt. Otto isn't "thinking harder" on the retry. It's running another iteration with more information in the context window.

Practical implication: If Otto loops without converging after several attempts, the fix is usually more specific context — not rephrasing the same prompt.

While Otto builds, watch for:

  • Observe — what does Otto read? (error messages, schema, existing code)
  • Decide — what does it choose to do next? (rewrite the SQL, try a different join)
  • Act — what does it execute? (edit a file, run a query, return output)

Spot-check the output

Before moving to triggers, ask Otto for a quick sanity check:

Give me a quick summary of what the flow produces. How many distinct
customers are represented? Which channels have the most transactions?
Does anything about the distribution look off?

Read the answer. If something looks off — a channel with unexpectedly low or zero transactions, or a total that doesn't match your rough expectation — ask Otto to show you the underlying records for that specific number.

note

The full verification framework — aggregate check, code inspection, data-level drill-down — is in Trust and Verify in ADE 201. You'll apply all three checks to this pipeline in the capstone lab.

Production considerations

This lab focuses on speed of building. The next course covers the concerns senior teams raise before promoting agentic pipelines to production: cost and latency of agent loops, PII in prompts, immutable audit of generated SQL, and role-based access control (RBAC) on Agent mode. See ADE 201 when you're ready.

Phase 6: Trigger — automate the pipeline

The pipeline runs when you prompt Otto to run it — Otto is an agent that operates on the pipeline, not part of it. A prompt is a manual trigger. For anything that should run reliably in production, you need a trigger that doesn't depend on someone being available. Here you'll add two automations: one that runs the pipeline on a schedule, and one that alerts you when it fails.

What "automate" means in a trial workspace

The automation YAML files you'll create in this phase tell Ascend when and how to trigger your flow. Automations only execute in Deployments — a separate environment that runs on a branch of your project. In a trial workspace, you will author the YAML files and they will be valid configuration, but they will not fire until you merge your changes to a Deployment.

To see the automations run: create a Deployment and merge your workspace changes. If you want to complete this phase as a configuration exercise first and deploy later, that's a reasonable path — the YAML files you create here are reusable.

You're done with Phase 6 when:

  • automations/ contains a schedule YAML with a cron expression and run_flow action
  • automations/ contains a failure YAML with a FlowRunFailure trigger and email_alert action
  • ✓ Both files are syntactically valid (Otto will tell you if there's a YAML error)

Authoring valid, production-style automation config is the learning milestone for this phase — execution happens after you merge to a Deployment.

Schedule the pipeline

In the same Otto thread, paste:

Create an Automation YAML file that runs this flow every day at 6 AM UTC.
Save it in the automations/ folder.

Otto will create a YAML file in automations/. Open it in the Files panel. The file wires a time-based trigger (a cron schedule; for example, 0 6 * * * means every day at 6:00 UTC) to a run_flow action — that's the full trigger pattern: schedule tick → automation fires → run_flow executes the pipeline. This automation is pure orchestration; the schedule trigger does not invoke Otto as an agent. (The failure automation below uses email_alert with an Otto-generated summary embedded — a different pattern.)

Add failure alerting

Now add the second automation — the one that fires when something breaks:

Create a separate automation YAML for this flow. When this flow fails,
send me an email alert that includes an Otto summary of what failed
and its best diagnosis of why.

Otto will generate a YAML with the to: field populated — open the file to confirm or update the email address before merging to a Deployment. Verify that outbound email is enabled in your instance settings, or see Send email alerts for configuration steps.

Email failure alerting automation

Event triggers vs. schedules — and two failure-response patterns

A schedule fires on a clock event and knows nothing about what happened in the pipeline. An event trigger (FlowRunFailure) fires when a specific system event occurs — and it can carry context about that event (which pipeline failed, which component, what the error was) directly into Otto's context.

Otto's diagnostic output is AI-generated — treat suggested causes and next steps as hypotheses to validate, not verified root cause analysis. From Context, Tools, and Triggers: "The best triggers are the ones that currently wake you up at 3am."

Tools: MCP connections

You've given Otto context via a rules file and configured event-based triggers via automations. The third axis of the CTT framework — tools — includes not just workspace access (Runtime) but external integrations via MCP (Model Context Protocol).

MCP is how you wire Otto to services outside Ascend. You author an otto/mcp.yaml file that registers the server, and Otto can then invoke those external tools alongside its workspace tools.

MCP gives Otto access to external systems:

  • Slack or Teams — post alerts and summaries
  • GitHub — open issues, read PR context
  • PagerDuty — route incidents

Setting up an MCP server requires external service credentials and is covered in a dedicated tutorial: Set up Slack via MCP. You'll use MCP in the ADE 201 capstone to route failure alerts to a Slack channel rather than email.

Two ways to respond to a failure event
  • email_alert + include_otto_summary: true — Ascend sends the email directly with Otto's diagnostic summary embedded. This is what you just built.
  • run_otto — invokes Otto as a full agent in response to the event. Otto can take actions — post to Slack via MCP, open a GitHub issue, run a query to gather context. This is the pattern covered in the ADE 201 capstone.

What just happened

You configured all three building blocks of the Context, Tools, and Triggers (CTT) framework — deliberately, in the right order:

  • Context — a code standards file injected into every Otto conversation in this project. Every model Otto generated followed your naming conventions and comment requirements — without you re-explaining them in each prompt.
  • Tools — Runtime access scoped to your workspace; Agent mode granting action permissions; MCP as the path to extend tool access to external services.
  • Trigger — two automations: a daily schedule that runs the pipeline without a human prompt, and a failure event that fires Otto's diagnostic loop when something breaks.

In this lab path, you didn't need to hand-author SQL — Otto owned the SQL, with you specifying outcomes and reviewing the output. Reading the SQL Otto produced to verify it matched your business rules — not writing SQL from scratch — still mattered for encoding the right business logic in a rule and checking that the output matched your definition. The skill that mattered most wasn't code generation. It was clarity about what you wanted, context that made the agent's behavior predictable, and a check that confirmed it.

That's the division of labor in agentic data engineering. The agent executes. You direct, configure, and govern.

Key takeaways
  • Rules files are your context stack. Encoding standards in otto/rules/ gives Otto persistent instructions that shape every output — without re-explaining them in every prompt. A well-maintained rules library compounds over time: each standard you add makes Otto's output more predictable and auditable.
  • Specify outcomes, not steps. Prompts that tend to perform well define what "done" looks like — the output schema, the grain (what one row represents), the constraints — and let the agent determine implementation. Include a verification step ("confirm which tables contain X before building") for any non-trivial build.
  • Triggers turn a pipeline into a system. A flow that only runs when you prompt it is a tool. A flow with a schedule automation and a failure trigger is a system — one that operates without requiring a person to be available, and that wakes Otto instead of waking you when something goes wrong.
Optional next steps

If you're evaluating Ascend for your team, you can book a session with a data engineer → to explore what an agentic approach would look like in your environment. You can also explore the Ascend documentation for more on Otto, flows, rules, and automations.

You've completed all the hands-on labs in ADE Foundations. Ready to make it official?

← Back to ADE Foundations (101) — and check whether your course includes a certification assessment.

Additional Reading

Ascend documentation:

  • Ascend developer docs — Search "Otto" for agent configuration, Runtime setup, and tool access; search "rules" for the full rules file reference; search "automations" for the trigger and automation configuration reference.
  • Set up Slack via MCP — Step-by-step tutorial for connecting Slack to Otto via Model Context Protocol.

Context engineering:

  • Anthropic — Prompt Engineering Overview — An orientation to prompt engineering with pointers to specific techniques and Console tools that apply directly to the outcome-specified prompts you used in this lab.

Agentic output verification: