Skip to main content

Context Engineering: What Your Agent Knows Before It Acts

Context engineering is the practice of giving an agent the right information — in the right structure, at the right time — to make decisions you'd trust. This module builds on the CTT framework (Context, Tools, Triggers) from ADE 101 to show what a production context stack looks like.

Two engineers each ask their agent to extend the orders_daily pipeline to handle a new product category. One gets a working pipeline quickly. The other gets something that runs but produces wrong numbers — and spends an hour debugging before discovering the agent ignored the team's null-handling convention and assumed a surrogate key (a system-generated ID that links tables) that doesn't exist in this schema.

Same model. Same task. Same agent. The difference was:

  • What the agent knew before it started — team conventions, schema contracts, historical norms
  • What standards it could reference — documented null-handling, join-key rules, output expectations
  • What "normal" looked like — data profiles and past decisions it could compare against

Context engineering is what makes the first outcome repeatable — consistently, not just occasionally.

In this module you will:

  • Design a layered context stack for a production data engineering agent
  • Configure rules, commands, and custom agents to programmatically shape agent behavior
  • Apply the learning loop pattern to compound context quality over time
  • Choose between retrieval-augmented generation (RAG), long-context, and hybrid retrieval strategies based on your task requirements
  • Evaluate when the Model Context Protocol (MCP) is the right integration pattern — and when native platform tools make it unnecessary

Context engineering in depth

The Context, Tools, and Triggers module in ADE Foundations introduced the CTT framework and the six context categories. Here we go deeper — into what a production-grade context stack actually looks like, how to build one, and how to make it compound over time.

A mature context stack for a data engineering agent contains distinct layers, each serving a different function:

LayerWhat it containsWhy it matters
System instructionsRole, non-negotiables, output format expectationsSets the agent's operating boundaries for every interaction
Team conventionsCoding standards, scheduling constraints, data contracts — any standard that would otherwise exist only in someone's headEncodes team knowledge that agents can reference before taking action
Schema metadataField types, nullability, primary/foreign key relationships, upstream quirksPrevents structural errors and wrong-join assumptions
Data profilesDistributions, null rates, cardinality, historical value rangesLets the agent distinguish "legitimate upstream change" from "something is broken"
Lineage contextUpstream sources, downstream consumers, dependency graphPrevents fixing one thing while silently breaking another
Past decisionsStructured decision logs, incident postmortems, design rationaleCompounds over time — the agent benefits from what the team has already reasoned through
Business rulesWhy this column never has nulls in production; why this join key is composite; what "valid" means for this domainThe tacit knowledge that separates a correct pipeline from a technically-valid-but-wrong one

The principle behind the table: if a standard lives only in someone's head, the agent doesn't know it exists. If it's a file the agent can read, it's part of the context stack. The practical starting point for any team is a one-week sprint converting head-held knowledge into readable files. That sprint compounds.

Here's what a rules file looks like in practice — a glob-scoped rule that fires automatically whenever the agent opens a Python file:

otto/rules/code_standards_python.md
---
otto:
rule:
alwaysApply: false
description: Python coding standards for data pipeline development
globs:
- "*.py"
---

# Python Code Standards

## Null handling
- Never drop null records — flag them with a `_data_quality_flag` column
- Null in join keys: escalate to human review, do not substitute surrogate

## Schema conventions
- All primary keys are composite (account_id + event_timestamp) — no surrogate keys
- Timestamps always UTC, stored as TIMESTAMP_NTZ

## Error handling
- All API calls: exponential backoff, max 5 retries, log final failure to `pipeline_errors` table
- Schema mismatch: open PR with impact assessment, do not auto-deploy

## Output validation
- Row count delta > 10% from 7-day rolling average: flag and pause, do not publish

One file. One agent read. Every interaction with a Python file now benefits from that context — and conversations about YAML configs or SQL queries don't pay the token cost.

Context engineering is not prompt engineering. Prompt engineering optimizes individual interactions. Context engineering optimizes the system — the standing information the agent has before it reads your prompt. If you have a week to invest in agentic quality, spend four days on the context stack and one on prompts.

Rules, commands, and agents

The team conventions layer is implemented through three programmable primitives — each with a different scope and lifecycle:

PrimitiveWhat it isWhen it firesBest for
RuleMarkdown file with persistent instructionsAutomatically, based on scopeCoding standards, domain constraints, output format requirements
CommandReusable prompt your team can invoke by nameOn-demandReflection workflows, standardized reporting, repeatable analysis
AgentCustom persona with its own instructions, model settings, and tool accessOn-demand, replaces the defaultSpecialized roles: data quality validator, incident responder, schema reviewer

Rule scoping

Context pollution is a real failure mode. A large rules library loaded into every conversation inflates token usage and can degrade agent focus. Three scoping options let you build a rich library without the bloat:

  • Always-on (alwaysApply: true) — loads into every conversation. Reserve this for high-value standing instructions: team-wide principles, output format requirements, learning prompts.
  • Glob-scoped (globs: ["*.py"]) — loads only when the agent is working with files that match the pattern. The code standards example above uses this: it fires for .py files, not SQL queries or YAML configs.
  • Keyword-scoped (keywords: [...]) — loads when specific phrases appear in the user's prompt. Appropriate for domain constraints that are irrelevant outside their context — scheduling rules that shouldn't fire during a schema design conversation, for example.

The practical question when writing any rule: how often is this actually relevant? If the answer is "always," make it always-on. If it's file-type or topic specific, scope it. The right scoping choice keeps the context window focused on what matters for the task at hand.

How this works in Ascend

Otto's programmatic interface uses an otto/ directory in your project. Rules live at otto/rules/, commands at otto/commands/, and custom agents at otto/agents/. The YAML frontmatter in each file controls scoping — the same patterns shown here. Custom agents let you define specialized personas (a Data Quality Agent, a Schema Reviewer) that replace Otto's default instructions with purpose-built ones for that role. The architectural concept — layered, scoped context with reusable commands and specialized agents — applies to any agentic platform with configurable rule systems.

The learning loop

One of the most underused patterns in agentic systems: agents can distill their own learnings into the context stack that governs future interactions. The mechanism is two files — a learning rule and a learning command.

The learning rule is an always-on instruction that tells the agent to notice and propose rule updates whenever it encounters corrections, undocumented patterns, or ambiguity that slowed it down:

otto/rules/learning.md
---
otto:
rule:
alwaysApply: true
description: Guidelines for capturing learnings and improving project rules over time
---

# Learning and Rule Improvement

As you work and receive feedback, actively identify opportunities to improve project rules.

## When to propose rule changes

Propose adding or modifying rules when you observe:
- **User corrections**: A mistake the user had to correct
- **Project-specific patterns**: Conventions not covered by existing rules
- **Ambiguity resolution**: Clarifications that resolved unclear guidance

## Process

1. Identify the learning or pattern worth capturing
2. Check existing rules for overlap or conflicts
3. Propose the change with clear reasoning — what rule, why it's valuable, how it avoids duplication
4. Wait for confirmation before making changes

The learning command gives any team member a way to explicitly trigger a reflection session after complex work:

otto/commands/learning.md
---
otto:
command:
description: Review this conversation and update project rules with any new learnings
---

Review this conversation for new patterns, conventions, mistakes, or lessons learned.
For each learning, propose creating a new rule file or updating an existing one.
Keep rules concise and use keyword or glob scoping where appropriate.
Wait for confirmation before making changes.

Run the command after any complex task — a schema migration, an incident investigation, a new data source integration. The agent reviews its own work, identifies what wasn't in the rules, and proposes additions. A human reviews, accepts what's accurate, discards what's wrong.

Over time, the context stack encodes the hard-won lessons from every complex task the team has run. Each session starts with slightly better context than the last. Teams that build this workflow find agent quality improves over months, not just weeks.

tip

The learning loop requires a deliberate workflow. An agent won't update its own rules unprompted. The always-on rule makes the agent notice opportunities; the command gives the team a trigger to act on them. Both are needed.

For a hands-on walkthrough of building out the full harness — learning rule, learning command, scoped rules, and a custom agent — see Lab 2: Programmatic Agentic Systems.

Intent specification craft

The prompts you write still matter. Better prompts, with better context stacks behind them, produce measurably better outcomes. Work such as research on systematic prompt optimization (SPRIG) shows that systematic optimization over prompt structure can produce measurable quality improvements.

Here are the dimensions that matter most — illustrated with a concrete example:

# Intent specification — what good looks like

## Role
You are a pipeline reliability engineer for the data team.
Your focus is diagnosing pipeline failures and proposing
minimal, targeted fixes. You do not rewrite working code.

## Task
The orders_daily pipeline failed on the most recent run.
Investigate the root cause. If the upstream schema has changed,
identify every affected downstream component and propose a patch.

## Constraints
- Do not modify any downstream consumer configurations directly
- Do not merge to main — open a pull request only
- If you reach 10 tool calls without resolution, escalate with
full reasoning trace

## Examples of correct output format
A PR description containing:
- Root cause (one sentence)
- Affected components (bulleted list)
- Fix summary (what changed and why)
- Validation evidence (output from staging run)

## Success criteria
Fix validated in staging, PR open with full reasoning,
zero production deploys.

The contrast with "fix the broken pipeline" is stark. Both are prompts to the same agent with the same context stack. The structured version is dramatically more likely to produce a useful result — because the agent now knows what it's working on, what it's allowed to do, what it's not, and what success looks like.

The most important structural element: clarity about outcome, not steps. Specify what success looks like, what the constraints are, and what the agent should not do. The agent will figure out the steps.

RAG vs. long-context: the cost-quality tradeoff

As your context stack grows, you face a fundamental design tradeoff: Retrieval-Augmented Generation (RAG) — retrieve the most relevant context on demand — or load everything and let the model work with the full picture (long-context)?

Watch out: "lost in the middle"

When relevant information appears in the middle of a long context window, performance degrades significantly — research shows highest quality occurs when the most important information is at the beginning or end. Loading the full document doesn't help if the critical content is buried.

A long-context vs. RAG evaluation found that when resources allow full context loading, long-context models outperformed RAG-based approaches on output quality — though results depend on having relevant documents, adequate budgets, and fitting within context limits. When you can load the relevant documents fully, you may get better answers than fragmenting them through retrieval — subject to the same constraints.

ApproachQualityCostWhen to use
Long-contextHigherSignificantly higherHigh-stakes decisions; tasks requiring full document understanding
RAGGoodMuch lowerHigh-volume routine tasks; when relevant context is predictable
HybridNear long-context quality (in the studied settings, Li et al.)ModerateProduction systems with varied task complexity

The practical answer for most production systems: a hybrid approach. One implementation is the Self-Route pattern — a technique that dynamically decides per query whether to use RAG or full-context loading based on estimated task complexity, routing predictable queries to RAG and routing high-stakes or novel queries to long-context. Use RAG for high-volume routine tasks where the relevant context is predictable. Use long-context for high-stakes decisions where full document understanding matters. Prompt compression — which research has shown can reduce token usage by up to 60% with less than 5% accuracy impact — may extend the reach of long-context approaches without linear cost scaling, though results vary by task and model.

MCP in production

Before reaching for MCP, the first question is whether you need it at all. Many context sources an agent needs — schema metadata, lineage graphs, orchestration state, monitoring data — are native capabilities of a mature data platform. They're already accessible to agents without any integration layer.

MCP (Model Context Protocol) is the emerging standard for connecting agents to external systems — tools your data platform doesn't natively expose: Slack, GitHub, PagerDuty, external data catalogs, or systems maintained by other teams. Rather than writing custom integration code for each external tool, MCP provides a standardized interface: one server implementation per external service, accessible to any compliant agent runtime.

When-to-use guidance:

  • Use native tools first. If your platform exposes schema metadata, lineage, and orchestration state natively, reach for those — no MCP required.
  • Use MCP when multiple agents need the same external system (Slack, GitHub, an external catalog). One server implementation, many agent clients.
  • Use MCP when the external system is maintained by another team and you need a stable, versioned interface to it over time.
  • Use direct integration for one-off external connections where MCP overhead isn't justified.
  • Verify client support in your specific agent runtime before depending on MCP — not all runtimes support MCP as a client yet.

AWS Prescriptive Guidance on MCP covers production deployment patterns including governance and permission scoping. The catch: an MCP server is also a potential attack surface. Scope permissions the same way you scope tool permissions on agents — minimum necessary, audit logged. For the governance framework that controls which tools agents can access, see Governance and Security.

How this works in Ascend

Otto's agents access schema metadata, lineage graphs, and orchestration state as native platform capabilities — no MCP required for these. MCP in Ascend is for external connections: Slack, GitHub, PagerDuty, or external data catalogs your team maintains elsewhere. You configure MCP servers for external systems only; the core data engineering context is already built in. See Leveraging MCP Servers for Agentic Data Engineering for external connection configuration details. The architectural principle — use native tools where they exist, MCP for external systems — applies to any mature agentic stack.

Exercise: Build Your Context Stack

⏱ 15 minutes

You're asking Otto to draft a context stack for the orders_daily monitoring agent — then evaluating which layer it expands most thoroughly and where it asks for clarification.

The worked example below shows what a complete stack looks like. Now run the real version: open Otto and paste this:

Draft a complete context stack for an orders_daily pipeline monitoring agent. Include:

1. System instructions — role (pipeline reliability engineer), three things the agent must never do, and required output format (root cause + affected components + fix summary + validation evidence)

2. Rules files — suggest three files with appropriate scoping: one always-on team convention, one glob-scoped Python standard (fires on *.py files), one keyword-scoped scheduling constraint. Include the YAML frontmatter for each file.

3. Schema metadata — source: orders API v3; output: orders_daily table with composite key (order_id + event_date); ~48,000 daily rows; known quirk: API occasionally returns empty arrays instead of null

4. Data profile — expected daily row count (45,000–52,000), null rate for shipping_cost (8%, expected for free-shipping orders), order_status cardinality (7 values)

5. One past decisions log entry that would prevent a real mistake — for example, the null-handling decision or the composite key assumption

Format the output as files ready to save at otto/rules/ and otto/commands/.

Worked example — for reference:

## Context Stack: orders_daily Monitoring Agent

### System instructions
Role: Pipeline reliability engineer.
Focus: diagnose and propose targeted fixes.
Never: Rewrite working code. Modify downstream configs without approval.
Output: Always include root cause, affected components, proposed fix, validation plan.

### Rules files (team conventions)
1. `code_standards_python.md` (glob-scoped: `*.py`) — null handling, error conventions, API retry logic
2. `operations_scheduling.md` (keyword-scoped) — deployment windows, approval requirements, on-call contacts
3. `data_contract.md` (always-on) — schema guarantees between orders_daily and downstream consumers

### Schema metadata
- Source: orders API v3 (field list, types, nullable columns documented)
- Output: orders_daily table (primary key: order_id + event_date)
- Known upstream quirk: orders API occasionally returns empty arrays instead of null

### Data profile
- Expected daily row count: 45,000–52,000 (7-day rolling average)
- Null rate in shipping_cost: 8% (expected — free-shipping orders)
- order_status cardinality: 7 values (documented in data_contract.md)

### Native platform tools
- Schema catalog — current upstream schema and any recent diffs
- Lineage graph — downstream dependency map for impact assessment

### External MCP connections (if needed)
- Incident ticketing system — open tickets on confirmed failures
- Slack — notify on-call when agent escalates

### Past decisions log
Location: /decisions/orders_daily_decisions.md
Retrieval: Load most recent 10 entries when agent activates on failure

What to notice: Which layer Otto expands most thoroughly, and which it asks you to clarify. The gaps in Otto's output — what it assumes or leaves blank — reveal where your team's knowledge is still implicit and undocumented. That's where the context stack investment has the highest leverage.

Key takeaways
  • Context engineering is the highest-leverage investment in agentic quality. The same model with a rich context stack outperforms a thin-context version of itself — materially, not marginally. Spend four days on the context stack before spending one day on prompts.
  • The learning loop compounds. After complex tasks, distill agent learnings into rules files. Over months, the context stack encodes the team's collective reasoning — and every future interaction benefits.
  • Long-context performs as well as or better than RAG in document-heavy tasks when context budget allows; RAG wins significantly on cost. Use a hybrid approach in production: RAG for high-volume routine tasks, long-context for high-stakes decisions where full document understanding matters. Prompt compression can reduce token usage substantially while preserving most accuracy — results vary by task and model.
You'll use this in practice

The layered context stack gets built directly in Capstone Lab →, where you'll create system instructions and three rules files that guide every future agent interaction with the Expeditions pipeline.

A well-engineered context stack produces better agent output — but "better" is relative unless you have a systematic way to verify it. The next module covers how to check agentic output before it reaches stakeholders.

Next: Trust and Verify: Testing Agentic Output →

Additional Reading