Architecture for Agentic Data Engineering

Your team is convinced. You've built a proof of concept — an agent that caught an Airflow DAG failure, traced it through your dbt lineage graph, and drafted a fix — and now someone's asking: "How do we do this at scale?" The instinct is to jump straight to the agent architecture question — single-agent or multi-agent? hierarchical or flat? — and start drawing diagrams.

That instinct will cost you. The teams that struggle most with agentic data engineering aren't failing because of the agent pattern they chose. They're failing because the data infrastructure underneath doesn't support agents working safely and effectively. The architecture decision has two layers, and most teams only think about one of them.

After this module, you'll be able to:

Assess your data infrastructure readiness across the four capabilities that determine whether agentic systems succeed or fail
Choose between single-agent, multi-agent, and hierarchical architectures using the decision framework
Identify the primary failure modes for each infrastructure capability and agent pattern
Evaluate an architecture recommendation — agree or push back with reasoning grounded in the decision framework

New to ADE?

This is a 201-level module. If the term "agentic data engineering" is new to you, start with What Is Agentic Data Engineering → before continuing.

The Infrastructure Foundation for Agentic Data Engineering

Most agentic data engineering failures aren't about the agent. They're about the environment the agent runs in.

Teams that jump straight to prompting — give the agent a pipeline definition, tell it what to do — almost always have a bad time. The agent makes a mess, confidence in the approach collapses, and the project gets abandoned. The problem isn't the agent. It's that the environment wasn't set up to let the agent work. Four infrastructure capabilities determine whether agentic systems succeed or fail in production — and they apply regardless of which agent architecture pattern you choose.

Infrastructure before patterns. The single-agent vs. multi-agent decision only matters once your data platform can support agents safely and effectively. Without the four capabilities below, even the right architecture pattern will fail in production.

Capability	What it provides to agents	Primary failure mode without it
Context: Unified metadata	Schema, lineage, partition state, run history — the information layer agents reason from	Agents reason blind; decisions degrade; compute wasted reprocessing already-current partitions
Tools: Agent toolset	Code access, runtime execution, pipeline inspection, external integrations — with isolated workspaces as the safe boundary	Agents can observe but not act; or act without a safety net; no feedback loop
Triggers: Event-driven automation	Agents subscribe to pipeline events and act without manual invocation	Every task requires human initiation; agents can only respond, never anticipate
Guardrails	Permission boundaries, approval workflows, and audit trails — what agents are allowed to do	Agents accumulate permissions without constraint; production risk grows silently as team trust grows

Context: unified metadata

An agent operating without unified metadata is navigating blind. It can read your pipeline code — but it can't see schema information, partition state, data lineage, or whether a transformation has already run on a given partition. Without that context, the agent infers structure from code alone. That leads to worse decisions, compute waste on already-processed partitions, and recommendations that don't account for the current state of your data.

Unified metadata is what lets an agent migrating an existing pipeline trace the full lineage of that pipeline as it works, rather than guessing from a code definition. It's what lets an agent debugging a failure trace the root cause through upstream dependencies, rather than stopping at the transform that failed. It's what prevents an agent from reprocessing data that's already current — a particularly expensive mistake at scale.

Practical test: Can your agent answer "what is the current state of this partition, and what transformations have already run on it?" If answering that requires the agent to infer from code rather than read from metadata, you have a context gap. If you answer no, use the infrastructure readiness checklist below before proceeding to architecture design.

Tools: what the agent can actually call

An agent's effectiveness is bounded by its toolset. Three distinct tool capabilities matter here, and most teams only have one of them.

Tool capability	What it provides	Without it
Code access	Read ingestion, transformation, and orchestration code	Agent infers pipeline behavior from structure; misses runtime state
Runtime access	Execute pipelines, monitor performance, test optimizations in staging	No feedback loop; agent is a code reviewer, not a collaborator
External integrations	Query warehouses, open PRs, send notifications via MCP (a standardized interface for connecting AI agents to external data sources, tools, and workflows — see modelcontextprotocol.io) and other interfaces	Agent must hand off to humans at each cross-system boundary

Tool boundaries matter as much as tools themselves. Every capability available to an agent defines a potential failure surface. If there's one principle worth borrowing from software engineering for data pipelines, it's this: nothing goes to production without a review — and that principle applies with even more force when agents can generate and execute changes at machine speed. The right boundary pattern:

Agents have write tools scoped to dev workspaces only
All code changes surface as reviewable diffs (git or equivalent)
Production is read-only — agents observe and triage but cannot write directly
Runtime execution in isolation: agents run pipelines in staging, not production

The permission creep failure mode

An agent starts with narrow read-only access and gradually accumulates permissions as team trust grows. Without explicit, enforced write boundaries, this becomes a production risk. The pattern to prevent it: agents have write access to dev workspaces only, with hard separation enforced at the platform level — not by team convention. Platform enforcement is harder to erode than team convention — but it still requires deliberate governance. Document what enforcement is in place and revisit it when access patterns change.

Triggers: event-driven automation

Context and tools together are necessary but not sufficient. Without event-driven automation, agents are reactive only when you explicitly invoke them. You're still the alert system. You're still the one who notices the failure, copies the error, and asks the agent to look at it.

Event-driven automation connects metadata and tools into a system that responds without waiting for you to notice. Your pipeline runtime emits events — flow runs (individual pipeline executions), component completions (individual step completions within a pipeline), errors, anomalies — and agents subscribe to those events and act on them. A flow fails in production → the failure event triggers the agent → the agent reads error logs (context) and uses its toolset to run staging validation and open a branch (tools) → the agent notifies you with a summary. You don't have to notice the failure. The agent is already working on it.

This is also what makes complex workflows like migrations and tech debt remediation genuinely autonomous rather than supervised: agents respond to what they observe in the pipeline as it runs, not just to tasks you explicitly assigned.

Guardrails: the fourth architectural layer

Context, Tools, and Triggers define what agents can know, do, and respond to. Guardrails define what they're allowed to do — under what conditions, with what permissions, and with what human oversight at each decision point.

Guardrails thread through all three pillars: they bound the metadata agents can query, constrain which tools agents can invoke against which environments, and define which trigger-response chains require human approval before proceeding. Getting the first three layers in place is the prerequisite; getting guardrails right is what determines whether the system is safe to run continuously in production.

This is deep enough to be its own module. The full framework — permission boundaries, approval workflows, audit trails, agent behavioral rules, and how governance structures evolve as agent scope grows — is covered in Governance & Security →.

Without guardrails	With guardrails
Agent permissions drift upward as team trust grows	Boundaries enforced at platform level, not convention
No audit trail — agent actions invisible until something breaks	Every action logged with inputs, outputs, and reasoning
Trigger-response chains run without approval gates	High-risk paths require human sign-off before proceeding
Scope creep goes undetected	Permission changes require deliberate review

Before selecting an agent architecture pattern, assess your infrastructure readiness. For each item, mark: ✅ In place / ⚠️ Partial / ❌ Not in place.

Context: unified metadata

Agents can read schema information at query time (not inferred from code)
Data lineage is surfaced and accessible programmatically
Partition state is visible: which partitions have processed, by which transform, when
Run history is queryable — agents can see what ran, when, and with what result

Tools: agent toolset + boundaries

Agents have code access: can read ingestion, transformation, and orchestration code
Agents have runtime access: can execute pipelines, monitor performance, and test optimizations in staging
External integrations available: agents can open PRs, send notifications, query warehouses as needed
Agents have write tools scoped to dev workspaces only
All agent-generated changes appear as reviewable diffs (git or equivalent)
Production is read-only for agents — no direct write access
No changes reach production without explicit human approval
A rollback mechanism exists for agent-generated changes

Triggers: event-driven automation

Pipeline runtime emits events for flow starts, completions, failures, and anomalies
Agents can subscribe to events and act without manual invocation
Human escalation is built into the event loop (agent notifies before acting on high-risk paths)

Guardrails

Permission boundaries are defined and enforced at the platform level (not by team convention)
Approval workflows exist for high-risk agent actions (e.g., schema changes, production-adjacent writes)
Agent actions are logged with enough detail to reconstruct what changed, when, and why
A process exists for reviewing and adjusting agent permissions as scope evolves

Scoring: All ✅ → proceed to agent architecture design. Any ❌ → address before selecting an architecture pattern. Partial ⚠️ → acceptable if the gap is scoped — document the limitation explicitly in your ADR. For a deeper guardrails framework, see Governance & Security →.

Three Agentic Architecture Patterns

With the infrastructure foundation — layer one — addressed, the agent architecture decision — layer two — becomes tractable: which pattern fits this specific task? Most production agentic systems fall into one of three patterns. Each has a natural use case, a cost profile, and a failure mode you should understand before you commit.

A context window is the maximum amount of information an agent can hold and reason over in a single session — as task scope expands, this becomes the binding constraint for single-agent systems.

Pattern	Description	Best for	Primary failure mode
Single-agent	One agent, one context window, full task lifecycle	Sequential workflows; well-bounded tasks; starting out	Context window pressure as scope grows
Multi-agent	Specialized agents per concern, working in parallel or sequence	Parallelizable workloads; tasks requiring diverse toolsets	Coordination overhead; information loss at handoffs
Hierarchical	Orchestrator agent + specialist subagents	Enterprise scale; complex multi-domain workflows	Orchestrator bottleneck; delegation errors cascade

The evidence is consistent: single-agent systems often match or outperform multi-agent alternatives on sequential work; multi-agent wins on decomposable parallel tasks. Two studies support this direction: a 2026 empirical evaluation of single- vs. multi-agent systems on reasoning tasks found single-agent systems outperforming multi-agent alternatives on multi-hop reasoning tasks under matched compute budgets — single-agent systems are more information-efficient under fixed compute. A controlled evaluation of five coordination architectures across 260 configurations found up to +80.8% improvement for multi-agent coordination on decomposable financial reasoning tasks and up to −70.0% degradation on sequential planning tasks — confirming that architecture-task alignment determines outcome. Both use controlled benchmarks; results are directional, not universal production guarantees.

The pattern most teams should start with: single-agent. Add complexity only when you hit a concrete limitation — not because the architecture diagram looks more impressive.

What each pattern looks like in practice

Single-agent

Single-agent systems have one agent that receives context, reasons through the full task, calls tools, and returns a result. The same agent that reads the pipeline logs also writes the fix and opens the PR. This is simpler to debug (one reasoning trace), easier to observe (one stream), and cheaper to run. The constraint is context: as task scope expands, you're eventually asking one agent to hold more information than fits usefully in one window.

Multi-agent

Multi-agent systems split concerns across specialized agents. One investigates the failure; another writes the patch; a third validates it in staging. Specialization narrows each agent's context to the subset of pipeline metadata, logs, and code relevant to that agent's function only. The cost: every handoff between agents is a potential point of information loss. Agents summarize context for their successors, and summaries drop detail. At enough handoffs, the system loses coherence.

Hierarchical

Hierarchical systems add an orchestrating agent that breaks tasks into subtasks, delegates to specialists, and synthesizes results. This is the pattern that scales to complex, multi-domain workflows. The failure mode: if the orchestrator reasons poorly about delegation — assigns the wrong subtask to the wrong specialist, or loses track of state across dependencies — the whole system fails in ways that are hard to trace.

Decision Framework: Choosing an Architecture Pattern

Use this framework when choosing an architecture for a new agent use case:

What "well-bounded" means: A task is well-bounded if it operates on a known scope, the decision space is enumerable, and the required context fits in one session without aggressive summarization. Counter-example: "audit all pipelines across all domains for schema drift" is unbounded — unknown scope, open-ended decision space.

The context boundary problem

Every architecture decision is also a context decision: what does each agent know, and what can it do?

Pattern	Context boundary challenge	Design implication
Single-agent	Full lifecycle in one context window — can bloat quickly	Aggressive summarization; exclude artifacts already processed
Multi-agent	Each agent needs enough context to act, but not all context from the system	Define inter-agent handoff protocols explicitly
Hierarchical	Orchestrator must know enough to delegate; specialists need their domain context only	Separate orchestrator context from specialist context

The practical guidance: before finalizing any architecture choice, write down the context each agent will have at the moment it needs to make its most critical decision. If you can't write that list clearly, the architecture isn't ready.

For context management fundamentals, see Context, Tools, and Triggers. If context management is new to you, that module covers the fundamentals before applying them here.

Why Agentic Data Engineering Projects Fail

Before you commit to any agentic architecture, you need a realistic baseline for where the industry stands.

Industry context: project failure rates

Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027. The leading reasons: escalating costs that weren't scoped correctly at the start, unclear business value, and inadequate risk controls. This isn't a reason not to build — it's a reason to build with clear success criteria, cost visibility, and governance from day one. ADE 201 is the course about those things.

The four infrastructure gaps identified above — missing context, incomplete tooling, absent triggers, and ungoverned permissions — manifest in production as the following failure patterns.

The four most common failure patterns in production agentic systems:

Missing infrastructure — Jumping straight to agent patterns without the Context / Tools / Triggers foundation in place. The agent makes decisions without the context it needs (no unified metadata); can observe pipelines but not safely act on them (no runtime tooling with boundaries); every task requires manual initiation (no event-driven triggers).
Architecture mismatch — Choosing multi-agent for a sequential task because it sounds more sophisticated. The coordination overhead eliminates the performance gains, producing a system that's slower and harder to debug than the single-agent equivalent would have been.
Context starvation — The architecture is correct but context-poor. The agent makes decisions without the information it would need to make them well. This is a context engineering failure, not an architecture failure — but it's often diagnosed as the latter.
Scope creep without governance — An agent starts with narrow read-only access and gradually accumulates permissions as team trust grows. Without explicit governance structures, this becomes a production risk. More on this in Governance & Security.

The teams succeeding in production share a pattern: they built the infrastructure foundation first, started simple (single-agent, narrow scope, high guardrails), measured the result against defined criteria, and expanded deliberately.

Exercise: Architecture Decision Record

⏱ 15 minutes

Paste the prompt below into Otto (or any AI assistant). Read the output — then answer the three judgment questions before moving on.

I'm the data engineering lead for a mid-size retailer. We're planning to add an AI agent
to triage failures on our orders_daily pipeline (ingestion → transform → quality check →
publish). Here's our current state:

Infrastructure:
- Context: Schema queryable in Snowflake; dbt lineage docs exist but are generated manually
  on each release, not in real time; partition state and run history not surfaced to agents
- Tools: Agents can read our dbt and Airflow code in GitHub; we have a staging Snowflake
  environment but Airflow DAGs run against production by default; all code changes go through
  PRs but there's no required reviewer — engineers can self-merge
- Triggers: Airflow sends failure alerts to Slack; no programmatic event emission; the
  on-call engineer monitors Slack and manually kicks off any investigation
- Guardrails: No formal agent permission policy; engineers share a service account with broad
  Snowflake and Airflow access; no audit log for agent actions

Team: 4 engineers, mixed seniority. We see ~3 pipeline failures/week; median resolution
time is 45 minutes. The main pain point is schema drift from an upstream API — every failure
requires someone to manually trace the lineage and find the affected transform.

The agent's job: detect a failure, read the Airflow error and affected dbt model, identify
the likely root cause, propose a fix, open a PR, and notify the on-call engineer for review.

Using the single-agent, multi-agent, and hierarchical architecture patterns, which pattern
do you recommend — and why? What infrastructure gaps does the team need to close before
deploying safely? What would need to change (in infrastructure or task scope) for a
different pattern to make sense?

After you get the output, answer these before moving on:

Do you agree with the architecture recommendation? The task — detect failure → trace lineage → propose fix → open PR — is sequential, not parallelizable. Based on the research cited in this module, which pattern should be favored for sequential work, and does the recommendation match?
Did the AI flag the right infrastructure gaps? Two gaps here are critical blockers, not minor limitations: Airflow runs against production by default (agents have no safe staging runtime), and the shared service account means there are no enforceable permission boundaries. Did the output catch both? If it rated either as partial rather than not ready, do you agree — and why or why not?
What's the one change that would justify escalating to hierarchical? Think about what would have to be true about the task scope or team size to make the coordination overhead of hierarchical worth it.

The architecture tells you what agents can know, do, and respond to — and where the guardrails are. Context Engineering covers what they'll actually know in order to act correctly. In the next module, you'll build the retrieval and prompt strategies that determine what each agent sees at its decision point.

Key takeaways

Infrastructure before patterns. Four capabilities determine whether any agent architecture succeeds: Context (unified metadata), Tools (code access, runtime execution, and safe workspace boundaries), Triggers (event-driven automation), and Guardrails (permission boundaries, approval workflows, and audit trails). Get these in place before selecting an agent pattern.
Runtime access is the tool differentiator. Code access alone makes agents expensive code reviewers. Runtime access — the ability to execute pipelines, monitor performance, and validate output in staging — creates the feedback loop that makes agentic data engineering genuinely faster than supervised manual work.
Tool boundaries matter as much as tools. Agents need write access to dev workspaces, not production. The detect → propose → human-approve → deploy loop is how you get the speed of agentic automation without the risk of silent production failures.
Start with single-agent. A 2026 preprint finds that single-agent systems often match or outperform multi-agent alternatives on multi-hop reasoning tasks under matched compute budgets — directional evidence for start-simple, not a universal production guarantee. Add complexity when you hit a concrete limitation, not when the diagram looks more impressive.
Build with success criteria, cost visibility, and governance from day one — Gartner identifies escalating costs, unclear business value, and inadequate risk controls as the leading causes of the 40%+ cancellation rate.

How this works in Ascend

Ascend surfaces unified metadata — schema, lineage, partition state, run history — to Otto (Ascend's AI assistant) at query time. Developer workspaces are isolated from production, with CI/CD built into the platform's deployment model. When configured via Otto Automations, Otto can subscribe to pipeline events and initiate triage workflows autonomously, surfacing flow run results and notifications for human review before anything reaches production. (Note: Otto Automations is currently a preview feature — check release status before relying on it for production workloads.) The architectural principle — infrastructure before agent patterns — applies to any agentic data stack.

You'll use this in practice

The infrastructure readiness checklist and decision framework get applied directly in Capstone Lab →, where you'll assess the Expeditions platform infrastructure, choose an architecture pattern, document your rationale, and write success criteria before building anything.

The architecture defines what agents can know, do, and respond to — but not what they'll actually know at decision time. Context Engineering covers the retrieval and prompt strategies that determine what each agent sees when it matters most.

Next: Context Engineering: The Skill That Separates Good from Great →

Additional Reading

Ascend Agent Architecture: How Otto and Custom Agents Work (Ascend docs) — Otto overview — capabilities, agent mode, interaction modes, and links to custom agents and automations.
Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning (arXiv:2604.02460, 2026) — Empirical evaluation showing single-agent systems outperforming multi-agent alternatives on multi-hop reasoning tasks under matched compute budgets — the clearest evidence for the start-simple recommendation; mechanism is information efficiency under compute constraints, not coordination overhead.
Towards a Science of Scaling Agent Systems (arXiv:2512.08296, 2025) — Controlled evaluation of five coordination architectures across 260 configurations; source for the +80.8%/−70.0% performance range cited in this module. Treat directional findings, not magnitudes, as actionable.
Gartner: 40%+ Agentic AI Projects Canceled by 2027 (Gartner, 2025) — The analyst baseline for project failure rates; essential context for setting realistic expectations with leadership.

The Infrastructure Foundation for Agentic Data Engineering​

Context: unified metadata​

Tools: what the agent can actually call​

Triggers: event-driven automation​

Guardrails: the fourth architectural layer​

Three Agentic Architecture Patterns​

What each pattern looks like in practice​

Single-agent​

Multi-agent​

Hierarchical​

Decision Framework: Choosing an Architecture Pattern​

The context boundary problem​

Why Agentic Data Engineering Projects Fail​

Additional Reading​