Architecture for Agentic Data Engineering
Your team is convinced. You've built a proof of concept — an agent that caught an Airflow DAG failure, traced it through your dbt lineage graph, and drafted a fix — and now someone's asking: "How do we do this at scale?" The instinct is to jump straight to the agent architecture question — single-agent or multi-agent? hierarchical or flat? — and start drawing diagrams.
That instinct will cost you. The teams that struggle most with agentic data engineering aren't failing because of the agent pattern they chose. They're failing because the data infrastructure underneath doesn't support agents working safely and effectively. The architecture decision has two layers, and most teams only think about one of them.
After this module, you'll be able to:
- Assess your data infrastructure readiness across the four capabilities that determine whether agentic systems succeed or fail
- Choose between single-agent, multi-agent, and hierarchical architectures using the decision framework
- Identify the primary failure modes for each infrastructure capability and agent pattern
- Evaluate an architecture recommendation — agree or push back with reasoning grounded in the decision framework
This is a 201-level module. If the term "agentic data engineering" is new to you, start with What Is Agentic Data Engineering → before continuing.
The Infrastructure Foundation for Agentic Data Engineering
Most agentic data engineering failures aren't about the agent. They're about the environment the agent runs in.
Teams that jump straight to prompting — give the agent a pipeline definition, tell it what to do — almost always have a bad time. The agent makes a mess, confidence in the approach collapses, and the project gets abandoned. The problem isn't the agent. It's that the environment wasn't set up to let the agent work. Four infrastructure capabilities determine whether agentic systems succeed or fail in production — and they apply regardless of which agent architecture pattern you choose.
Infrastructure before patterns. The single-agent vs. multi-agent decision only matters once your data platform can support agents safely and effectively. Without the four capabilities below, even the right architecture pattern will fail in production.
| Capability | What it provides to agents | Primary failure mode without it |
|---|---|---|
| Context: Unified metadata | Schema, lineage, partition state, run history — the information layer agents reason from | Agents reason blind; decisions degrade; compute wasted reprocessing already-current partitions |
| Tools: Agent toolset | Code access, runtime execution, pipeline inspection, external integrations — with isolated workspaces as the safe boundary | Agents can observe but not act; or act without a safety net; no feedback loop |
| Triggers: Event-driven automation | Agents subscribe to pipeline events and act without manual invocation | Every task requires human initiation; agents can only respond, never anticipate |
| Guardrails | Permission boundaries, approval workflows, and audit trails — what agents are allowed to do | Agents accumulate permissions without constraint; production risk grows silently as team trust grows |
Context: unified metadata
An agent operating without unified metadata is navigating blind. It can read your pipeline code — but it can't see schema information, partition state, data lineage, or whether a transformation has already run on a given partition. Without that context, the agent infers structure from code alone. That leads to worse decisions, compute waste on already-processed partitions, and recommendations that don't account for the current state of your data.
Unified metadata is what lets an agent migrating an existing pipeline trace the full lineage of that pipeline as it works, rather than guessing from a code definition. It's what lets an agent debugging a failure trace the root cause through upstream dependencies, rather than stopping at the transform that failed. It's what prevents an agent from reprocessing data that's already current — a particularly expensive mistake at scale.
Practical test: Can your agent answer "what is the current state of this partition, and what transformations have already run on it?" If answering that requires the agent to infer from code rather than read from metadata, you have a context gap. If you answer no, use the infrastructure readiness checklist below before proceeding to architecture design.
Tools: what the agent can actually call
An agent's effectiveness is bounded by its toolset. Three distinct tool capabilities matter here, and most teams only have one of them.
| Tool capability | What it provides | Without it |
|---|---|---|
| Code access | Read ingestion, transformation, and orchestration code | Agent infers pipeline behavior from structure; misses runtime state |
| Runtime access | Execute pipelines, monitor performance, test optimizations in staging | No feedback loop; agent is a code reviewer, not a collaborator |
| External integrations | Query warehouses, open PRs, send notifications via MCP (a standardized interface for connecting AI agents to external data sources, tools, and workflows — see modelcontextprotocol.io) and other interfaces | Agent must hand off to humans at each cross-system boundary |
Tool boundaries matter as much as tools themselves. Every capability available to an agent defines a potential failure surface. If there's one principle worth borrowing from software engineering for data pipelines, it's this: nothing goes to production without a review — and that principle applies with even more force when agents can generate and execute changes at machine speed. The right boundary pattern:
- Agents have write tools scoped to dev workspaces only
- All code changes surface as reviewable diffs (git or equivalent)
- Production is read-only — agents observe and triage but cannot write directly
- Runtime execution in isolation: agents run pipelines in staging, not production
An agent starts with narrow read-only access and gradually accumulates permissions as team trust grows. Without explicit, enforced write boundaries, this becomes a production risk. The pattern to prevent it: agents have write access to dev workspaces only, with hard separation enforced at the platform level — not by team convention. Platform enforcement is harder to erode than team convention — but it still requires deliberate governance. Document what enforcement is in place and revisit it when access patterns change.
Triggers: event-driven automation
Context and tools together are necessary but not sufficient. Without event-driven automation, agents are reactive only when you explicitly invoke them. You're still the alert system. You're still the one who notices the failure, copies the error, and asks the agent to look at it.
Event-driven automation connects metadata and tools into a system that responds without waiting for you to notice. Your pipeline runtime emits events — flow runs (individual pipeline executions), component completions (individual step completions within a pipeline), errors, anomalies — and agents subscribe to those events and act on them. A flow fails in production → the failure event triggers the agent → the agent reads error logs (context) and uses its toolset to run staging validation and open a branch (tools) → the agent notifies you with a summary. You don't have to notice the failure. The agent is already working on it.
This is also what makes complex workflows like migrations and tech debt remediation genuinely autonomous rather than supervised: agents respond to what they observe in the pipeline as it runs, not just to tasks you explicitly assigned.
Guardrails: the fourth architectural layer
Context, Tools, and Triggers define what agents can know, do, and respond to. Guardrails define what they're allowed to do — under what conditions, with what permissions, and with what human oversight at each decision point.
Guardrails thread through all three pillars: they bound the metadata agents can query, constrain which tools agents can invoke against which environments, and define which trigger-response chains require human approval before proceeding. Getting the first three layers in place is the prerequisite; getting guardrails right is what determines whether the system is safe to run continuously in production.
This is deep enough to be its own module. The full framework — permission boundaries, approval workflows, audit trails, agent behavioral rules, and how governance structures evolve as agent scope grows — is covered in Governance & Security →.
| Without guardrails | With guardrails |
|---|---|
| Agent permissions drift upward as team trust grows | Boundaries enforced at platform level, not convention |
| No audit trail — agent actions invisible until something breaks | Every action logged with inputs, outputs, and reasoning |
| Trigger-response chains run without approval gates | High-risk paths require human sign-off before proceeding |
| Scope creep goes undetected | Permission changes require deliberate review |
Before selecting an agent architecture pattern, assess your infrastructure readiness. For each item, mark: ✅ In place / ⚠️ Partial / ❌ Not in place.
Context: unified metadata
- Agents can read schema information at query time (not inferred from code)
- Data lineage is surfaced and accessible programmatically
- Partition state is visible: which partitions have processed, by which transform, when
- Run history is queryable — agents can see what ran, when, and with what result
Tools: agent toolset + boundaries
- Agents have code access: can read ingestion, transformation, and orchestration code
- Agents have runtime access: can execute pipelines, monitor performance, and test optimizations in staging
- External integrations available: agents can open PRs, send notifications, query warehouses as needed
- Agents have write tools scoped to dev workspaces only
- All agent-generated changes appear as reviewable diffs (git or equivalent)
- Production is read-only for agents — no direct write access
- No changes reach production without explicit human approval
- A rollback mechanism exists for agent-generated changes
Triggers: event-driven automation
- Pipeline runtime emits events for flow starts, completions, failures, and anomalies
- Agents can subscribe to events and act without manual invocation
- Human escalation is built into the event loop (agent notifies before acting on high-risk paths)
Guardrails
- Permission boundaries are defined and enforced at the platform level (not by team convention)
- Approval workflows exist for high-risk agent actions (e.g., schema changes, production-adjacent writes)
- Agent actions are logged with enough detail to reconstruct what changed, when, and why
- A process exists for reviewing and adjusting agent permissions as scope evolves
Scoring: All ✅ → proceed to agent architecture design. Any ❌ → address before selecting an architecture pattern. Partial ⚠️ → acceptable if the gap is scoped — document the limitation explicitly in your ADR. For a deeper guardrails framework, see Governance & Security →.
Three Agentic Architecture Patterns
With the infrastructure foundation — layer one — addressed, the agent architecture decision — layer two — becomes tractable: which pattern fits this specific task? Most production agentic systems fall into one of three patterns. Each has a natural use case, a cost profile, and a failure mode you should understand before you commit.
A context window is the maximum amount of information an agent can hold and reason over in a single session — as task scope expands, this becomes the binding constraint for single-agent systems.
| Pattern | Description | Best for | Primary failure mode |
|---|---|---|---|
| Single-agent | One agent, one context window, full task lifecycle | Sequential workflows; well-bounded tasks; starting out | Context window pressure as scope grows |
| Multi-agent | Specialized agents per concern, working in parallel or sequence | Parallelizable workloads; tasks requiring diverse toolsets | Coordination overhead; information loss at handoffs |
| Hierarchical | Orchestrator agent + specialist subagents | Enterprise scale; complex multi-domain workflows | Orchestrator bottleneck; delegation errors cascade |
The evidence is consistent: single-agent systems often match or outperform multi-agent alternatives on sequential work; multi-agent wins on decomposable parallel tasks. Two studies support this direction: a 2026 empirical evaluation of single- vs. multi-agent systems on reasoning tasks found single-agent systems outperforming multi-agent alternatives on multi-hop reasoning tasks under matched compute budgets — single-agent systems are more information-efficient under fixed compute. A controlled evaluation of five coordination architectures across 260 configurations found up to +80.8% improvement for multi-agent coordination on decomposable financial reasoning tasks and up to −70.0% degradation on sequential planning tasks — confirming that architecture-task alignment determines outcome. Both use controlled benchmarks; results are directional, not universal production guarantees.
The pattern most teams should start with: single-agent. Add complexity only when you hit a concrete limitation — not because the architecture diagram looks more impressive.
What each pattern looks like in practice
Single-agent
Single-agent systems have one agent that receives context, reasons through the full task, calls tools, and returns a result. The same agent that reads the pipeline logs also writes the fix and opens the PR. This is simpler to debug (one reasoning trace), easier to observe (one stream), and cheaper to run. The constraint is context: as task scope expands, you're eventually asking one agent to hold more information than fits usefully in one window.
Multi-agent
Multi-agent systems split concerns across specialized agents. One investigates the failure; another writes the patch; a third validates it in staging. Specialization narrows each agent's context to the subset of pipeline metadata, logs, and code relevant to that agent's function only. The cost: every handoff between agents is a potential point of information loss. Agents summarize context for their successors, and summaries drop detail. At enough handoffs, the system loses coherence.
Hierarchical
Hierarchical systems add an orchestrating agent that breaks tasks into subtasks, delegates to specialists, and synthesizes results. This is the pattern that scales to complex, multi-domain workflows. The failure mode: if the orchestrator reasons poorly about delegation — assigns the wrong subtask to the wrong specialist, or loses track of state across dependencies — the whole system fails in ways that are hard to trace.
Decision Framework: Choosing an Architecture Pattern
Use this framework when choosing an architecture for a new agent use case:
What "well-bounded" means: A task is well-bounded if it operates on a known scope, the decision space is enumerable, and the required context fits in one session without aggressive summarization. Counter-example: "audit all pipelines across all domains for schema drift" is unbounded — unknown scope, open-ended decision space.
The context boundary problem
Every architecture decision is also a context decision: what does each agent know, and what can it do?
| Pattern | Context boundary challenge | Design implication |
|---|---|---|
| Single-agent | Full lifecycle in one context window — can bloat quickly | Aggressive summarization; exclude artifacts already processed |
| Multi-agent | Each agent needs enough context to act, but not all context from the system | Define inter-agent handoff protocols explicitly |
| Hierarchical | Orchestrator must know enough to delegate; specialists need their domain context only | Separate orchestrator context from specialist context |
The practical guidance: before finalizing any architecture choice, write down the context each agent will have at the moment it needs to make its most critical decision. If you can't write that list clearly, the architecture isn't ready.
For context management fundamentals, see Context, Tools, and Triggers. If context management is new to you, that module covers the fundamentals before applying them here.
Why Agentic Data Engineering Projects Fail
Before you commit to any agentic architecture, you need a realistic baseline for where the industry stands.
Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027. The leading reasons: escalating costs that weren't scoped correctly at the start, unclear business value, and inadequate risk controls. This isn't a reason not to build — it's a reason to build with clear success criteria, cost visibility, and governance from day one. ADE 201 is the course about those things.
The four infrastructure gaps identified above — missing context, incomplete tooling, absent triggers, and ungoverned permissions — manifest in production as the following failure patterns.
The four most common failure patterns in production agentic systems:
-
Missing infrastructure — Jumping straight to agent patterns without the Context / Tools / Triggers foundation in place. The agent makes decisions without the context it needs (no unified metadata); can observe pipelines but not safely act on them (no runtime tooling with boundaries); every task requires manual initiation (no event-driven triggers).
-
Architecture mismatch — Choosing multi-agent for a sequential task because it sounds more sophisticated. The coordination overhead eliminates the performance gains, producing a system that's slower and harder to debug than the single-agent equivalent would have been.
-
Context starvation — The architecture is correct but context-poor. The agent makes decisions without the information it would need to make them well. This is a context engineering failure, not an architecture failure — but it's often diagnosed as the latter.
-
Scope creep without governance — An agent starts with narrow read-only access and gradually accumulates permissions as team trust grows. Without explicit governance structures, this becomes a production risk. More on this in Governance & Security.
The teams succeeding in production share a pattern: they built the infrastructure foundation first, started simple (single-agent, narrow scope, high guardrails), measured the result against defined criteria, and expanded deliberately.
⏱ 15 minutes
Paste the prompt below into Otto (or any AI assistant). Read the output — then answer the three judgment questions before moving on.
After you get the output, answer these before moving on:
-
Do you agree with the architecture recommendation? The task — detect failure → trace lineage → propose fix → open PR — is sequential, not parallelizable. Based on the research cited in this module, which pattern should be favored for sequential work, and does the recommendation match?
-
Did the AI flag the right infrastructure gaps? Two gaps here are critical blockers, not minor limitations: Airflow runs against production by default (agents have no safe staging runtime), and the shared service account means there are no enforceable permission boundaries. Did the output catch both? If it rated either as partial rather than not ready, do you agree — and why or why not?
-
What's the one change that would justify escalating to hierarchical? Think about what would have to be true about the task scope or team size to make the coordination overhead of hierarchical worth it.
The architecture tells you what agents can know, do, and respond to — and where the guardrails are. Context Engineering covers what they'll actually know in order to act correctly. In the next module, you'll build the retrieval and prompt strategies that determine what each agent sees at its decision point.
- Infrastructure before patterns. Four capabilities determine whether any agent architecture succeeds: Context (unified metadata), Tools (code access, runtime execution, and safe workspace boundaries), Triggers (event-driven automation), and Guardrails (permission boundaries, approval workflows, and audit trails). Get these in place before selecting an agent pattern.
- Runtime access is the tool differentiator. Code access alone makes agents expensive code reviewers. Runtime access — the ability to execute pipelines, monitor performance, and validate output in staging — creates the feedback loop that makes agentic data engineering genuinely faster than supervised manual work.
- Tool boundaries matter as much as tools. Agents need write access to dev workspaces, not production. The detect → propose → human-approve → deploy loop is how you get the speed of agentic automation without the risk of silent production failures.
- Start with single-agent. A 2026 preprint finds that single-agent systems often match or outperform multi-agent alternatives on multi-hop reasoning tasks under matched compute budgets — directional evidence for start-simple, not a universal production guarantee. Add complexity when you hit a concrete limitation, not when the diagram looks more impressive.
- Build with success criteria, cost visibility, and governance from day one — Gartner identifies escalating costs, unclear business value, and inadequate risk controls as the leading causes of the 40%+ cancellation rate.
Ascend surfaces unified metadata — schema, lineage, partition state, run history — to Otto (Ascend's AI assistant) at query time. Developer workspaces are isolated from production, with CI/CD built into the platform's deployment model. When configured via Otto Automations, Otto can subscribe to pipeline events and initiate triage workflows autonomously, surfacing flow run results and notifications for human review before anything reaches production. (Note: Otto Automations is currently a preview feature — check release status before relying on it for production workloads.) The architectural principle — infrastructure before agent patterns — applies to any agentic data stack.
The infrastructure readiness checklist and decision framework get applied directly in Capstone Lab →, where you'll assess the Expeditions platform infrastructure, choose an architecture pattern, document your rationale, and write success criteria before building anything.
The architecture defines what agents can know, do, and respond to — but not what they'll actually know at decision time. Context Engineering covers the retrieval and prompt strategies that determine what each agent sees when it matters most.
Next: Context Engineering: The Skill That Separates Good from Great →
Additional Reading
- Ascend Agent Architecture: How Otto and Custom Agents Work (Ascend docs) — Otto overview — capabilities, agent mode, interaction modes, and links to custom agents and automations.
- Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning (arXiv:2604.02460, 2026) — Empirical evaluation showing single-agent systems outperforming multi-agent alternatives on multi-hop reasoning tasks under matched compute budgets — the clearest evidence for the start-simple recommendation; mechanism is information efficiency under compute constraints, not coordination overhead.
- Towards a Science of Scaling Agent Systems (arXiv:2512.08296, 2025) — Controlled evaluation of five coordination architectures across 260 configurations; source for the +80.8%/−70.0% performance range cited in this module. Treat directional findings, not magnitudes, as actionable.
- Gartner: 40%+ Agentic AI Projects Canceled by 2027 (Gartner, 2025) — The analyst baseline for project failure rates; essential context for setting realistic expectations with leadership.