Family Guides
AWS2-CTX: Context, Memory, And Instruction Boundary Control
AWS2-CTX is about controlling the information that can steer an agent's behavior.
In ordinary business terms, this family asks: what instructions, documents, memories, search results, tool outputs, chat history, and handoff notes can the agent read; which of those sources is allowed to set policy; what should never be remembered or exported; and how does the organization know that an untrusted document did not quietly change what the agent was supposed to do?
This matters because agents do not only follow the latest user message. They may combine system instructions, project rules, skill instructions, retrieved documents, previous conversations, memory records, tool output, search results, and external content. If the trust order is unclear, a malicious document can tell the agent to ignore approvals, a stale memory can change a decision later, a tool output can become hidden instructions, or a handoff can leak private content into evidence. Reviewers need evidence that context sources are known, ranked, bounded, sanitized, and tested.
What This Family Covers
In scope:
- Instructions, project rules, user prompts, skill instructions, system messages, memory, conversation history, retrieved documents, search results, tool outputs, handoff notes, summaries, vector or retrieval stores, external data, and other context sources that can influence agent behavior.
- Trust and precedence relationships between context sources, including which sources may set policy, request actions, supply evidence, override prior context, or only provide data.
- Places where secrets, credentials, confidential data, private operational details, untrusted instructions, hidden prompt content, or unnecessary private content should not be stored, remembered, retrieved, summarized, or exported.
- Controls that stop lower-trust content from silently overriding higher-priority instructions, runtime policy, approval requirements, workspace boundaries, or security rules.
- Memory and durable-context write controls, including who or what may write persistent context, which workflows may use it, how it is reviewed, how long it is retained, and how material changes are attributed.
- Sanitization of handoffs, summaries, memory records, and evidence exports so they remain useful without copying raw secrets, confidential payloads, session material, hidden instructions, or unnecessary private content.
- Tests for instruction-boundary failures, context poisoning, retrieval poisoning, indirect prompt injection, tool-output poisoning, and memory interactions in high-impact workflows.
- Reviewable records of material changes to memory, retrieval corpora, context sources, instruction sources, or trust relationships that can affect high-impact workflows.
- Isolation or clean-context modes for high-risk workflows where lower-trust memory, retrieval corpora, shared context, or stale handoff state should not influence action review unless explicitly approved.
Out of scope:
- Deciding the complete system boundary, business purpose, owner map, and inventory of scoped systems. That belongs mostly to
AWS2-SCP. - Deciding which reusable skills, tools, connectors, prompt packs, packages, or supplier components are trusted sources. That belongs mostly to
AWS2-SRC, though those components may introduce context risks. - Enforcing allow, deny, approval, interruption, rollback, or budget decisions for actions. That belongs mostly to
AWS2-RUN. - Workspace sandboxing, filesystem boundaries, network egress, endpoint controls, and execution boundaries. Those belong mostly to
AWS2-WSB. - Secret and sensitive-data handling as a complete data-protection program. That belongs mostly to
AWS2-SEC, thoughAWS2-CTXidentifies places where sensitive material should not be stored or exported as context. - Complete log-retention, receipt integrity, or audit-trace design. That belongs mostly to
AWS2-LOG, though context-change and boundary-test evidence should be retained. - Full validation program design, red-team method, or finding lifecycle. That belongs mostly to
AWS2-VAL, though this family names the context-specific tests that should exist for high-impact workflows. - Legal review of prohibited practices, transparency duties, data protection, workplace monitoring, or biometric rules.
Level Summary
Levels are cumulative. Level 2 builds on Level 1, and Level 3 builds on both.
| Level | Plain-language meaning | Why this level exists | Typical evidence |
|---|---|---|---|
| Level 1 | The organization knows which context and instruction sources can steer agents, which sources are more trusted, and which places must not hold sensitive or unsafe content. | Context cannot be protected until reviewers know what the agent reads, remembers, retrieves, and treats as instructions. | Context-source inventory, instruction precedence rules, prohibited-storage list, redaction policy. |
| Level 2 | Production use has controls for lower-trust override, memory writes, durable context, and sanitized handoffs or evidence exports. | Managed production workflows need repeatable controls so untrusted content cannot silently change policy or persist unsafe state for later actions. | Runtime policy, memory write policy, memory change receipt, sanitized handoff examples, evidence export review. |
| Level 3 | High-impact workflows are tested for context-boundary attacks, retain records of material context changes, and can be isolated from lower-trust context. | High-impact workflows need stronger assurance that context attacks are tested, context changes are attributable, and review can happen in a clean context. | Prompt-injection tests, retrieval-poisoning tests, durable context change records, clean-context configuration, isolation test results. |
Candidate Controls
AWS2-CTX-L1-001: Context And Instruction Source Inventory Level 1
Requirement summary
Identify the context and instruction sources that can influence agent behavior, including user instructions, project instructions, skill instructions, retrieved documents, memory, and tool outputs where applicable. Distinguish trusted, user-provided, retrieved, external, generated, and lower-trust sources where practical.
Why it exists
Agents may treat many kinds of text or data as useful input. A reviewer needs to know which sources can affect agent behavior before deciding which ones can set rules, which ones only provide facts, which ones need sanitization, and which ones are risky enough to test.
Why this level
This belongs at Level 1 because source visibility is the foundation. Identifying sources does not prove the agent handles them safely, but it gives reviewers the map needed for precedence rules, memory controls, tests, and clean-context boundaries.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Context-source inventory | Runtime platform owner | Before production use and after context-source changes | Instruction sources, memory sources, retrieval sources, tool-output sources, generated summaries, handoffs, and external data that can influence the agent | Identifies likely sources; does not prove all hidden model or supplier context is visible. |
| Runtime context map | Runtime platform owner with workspace owner input | Before review and after runtime configuration changes | How user prompts, project rules, skill instructions, retrieval results, memory, and tool outputs enter the workflow | Supports review of context flow; does not prove the runtime enforces trust order. |
| Trust classification notes | Governance owner with runtime and evidence owner input | During initial scope review and periodic review | Which sources are trusted, user-provided, generated, retrieved, external, lower-trust, or unknown | Supports risk classification; does not prove lower-trust sources cannot influence behavior. |
AWS2-CTX-L1-002: Instruction Precedence And Trust Rules Level 1
Requirement summary
Define the intended precedence or trust relationship between instruction sources that can conflict, including which sources may set policy, request actions, provide evidence, or only supply data.
Why it exists
A retrieved document, tool output, or chat message can contain text that looks like an instruction. Without precedence rules, the agent or reviewer may not know whether to follow project policy, runtime policy, a user request, a skill instruction, a document instruction, or a stale memory.
Why this level
This belongs at Level 1 because every later control depends on knowing the intended trust order. The rule can be documented before the organization has full automated enforcement.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Instruction precedence policy | Runtime platform owner or governance owner | Before production use and after precedence changes | Which instruction sources outrank others, which sources can set policy, and which sources only provide data | Defines intent; does not prove runtime enforcement. |
| Conflict-handling examples | Runtime platform owner with evidence owner input | During design review and validation planning | Expected behavior when user content, retrieved content, tool output, memory, or project rules conflict | Supports reviewer understanding; does not prove every conflict type is covered. |
| Policy-to-runtime mapping | Runtime platform owner | Before production use and after runtime changes | How precedence expectations appear in runtime settings, prompts, policies, middleware, or review procedures | Supports implementation review; does not prove the model will always follow the rules. |
AWS2-CTX-L1-003: Prohibited Context Storage Locations Level 1
Requirement summary
Identify context locations where secrets, credentials, confidential data, or private operational details should not be stored, retrieved, summarized into memory, exported as evidence, or used as examples.
Why it exists
Context often gets copied. A secret can move from a file into a prompt, from a prompt into memory, from memory into a handoff, or from a handoff into an evidence packet. Reviewers need a clear list of places where sensitive or unsafe content should not go.
Why this level
This belongs at Level 1 because it is a basic boundary statement. The organization can name prohibited storage and export locations before it has complete automated scanning, redaction, or enforcement.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Prohibited-storage policy | Workspace or endpoint owner with runtime owner input | Before production use and after data-flow changes | Where secrets, credentials, confidential data, hidden prompt content, and private operational details must not be stored, retrieved, remembered, or exported | States expected handling; does not prove all content is detected or removed. |
| Context storage-location inventory | Runtime platform owner or evidence owner | Before review and after memory, retrieval, or export changes | Memory stores, vector stores, logs, summaries, handoffs, evidence exports, and examples that may hold context | Identifies storage points; does not prove the stored content is safe. |
| Sanitized example review | Evidence owner | During evidence preparation and periodic sampling | Example handoffs, summaries, memory records, or exports with sensitive values removed or replaced by safe placeholders | Supports redaction review; does not prove every historical record is sanitized. |
AWS2-CTX-L2-001: Lower-Trust Override Controls Level 2
Requirement summary
Enforce or document controls that prevent lower-trust content, retrieved content, tool output, or user-provided documents from silently overriding higher-priority instructions or approval requirements, including human-approval, runtime-policy, and boundary requirements.
Why it exists
Lower-trust content can contain instructions such as "ignore previous rules", "send this file externally", or "approval is no longer required". Production workflows need controls so these instructions cannot quietly bypass policy, approval gates, or workspace boundaries.
Why this level
This belongs at Level 2 because managed production use should have repeatable prevention, mediation, or documented compensating controls. Level 1 defines the intended trust order; Level 2 expects the organization to protect that order in real workflows.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Runtime policy or middleware control | Runtime platform owner | Before production use and after policy changes | Rules, middleware, prompts, or guardrails that keep lower-trust content from overriding approval, boundary, or security requirements | Supports enforcement review; does not prove prompt-injection immunity. |
| Lower-trust override test | Evidence or audit owner with runtime owner input | Before production use and during periodic validation | Scenario, lower-trust content, expected denial or escalation, actual result, finding, and remediation | Tests selected paths; does not prove all override attacks fail. |
| Approval-preservation review | Governance owner or evidence owner | During workflow review and after approval-rule changes | That lower-trust context cannot remove, weaken, or self-approve required human approval or runtime policy gates | Supports review of approval integrity; does not prove all approval paths are correctly configured. |
AWS2-CTX-L2-002: Memory And Durable Context Write Control Level 2
Requirement summary
Control memory or durable context writes that could affect future high-impact actions, including approval, review, owner expectations, retention, deletion, and change-attribution expectations for persistent changes.
Why it exists
Memory can make a temporary instruction durable. A wrong owner assumption, a false approval note, a private detail, or a poisoned retrieval hint can influence later work after the original conversation is forgotten. Production workflows need rules for what may be written, who or what approves it, how long it stays, and how it can be corrected.
Why this level
This belongs at Level 2 because durable memory is a production-state change. Level 1 identifies memory as a context source; Level 2 expects controls around writes that could affect later high-impact actions.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Memory write policy | Runtime platform owner with governance input | Before enabling durable memory and after memory-policy changes | Who or what may write memory, which workflows may use durable context, approval expectations, retention, deletion, and review rules | Defines memory governance; does not prove every write follows the rule. |
| Memory change receipt | Runtime platform owner or evidence owner | During operation and during review sampling | Actor or runtime, timestamp, source, workflow, reason, affected memory or context record, and review status where practical | Supports attribution; does not prove the memory content is true or harmless. |
| Retention and deletion review | Evidence owner with runtime owner input | During periodic review or when workflows are retired | Whether durable context records still have a valid purpose, owner, retention basis, and deletion path | Supports lifecycle review; does not prove all copies were removed from every system. |
AWS2-CTX-L2-003: Sanitized Handoffs, Summaries, Memory, And Evidence Exports Level 2
Requirement summary
Sanitize handoffs, summaries, memory records, and evidence exports to avoid storing secrets, credentials, session cookies, confidential payloads, untrusted instructions, hidden prompt content, or unnecessary private content.
Why it exists
Handoffs and evidence packets are meant to help humans or later agents continue work. They become risky when they copy raw secrets, private payloads, full prompt internals, hidden instructions, or untrusted content that later agents might treat as commands.
Why this level
This belongs at Level 2 because managed production evidence should be useful and reviewable without expanding exposure. Level 1 names prohibited storage locations; Level 2 expects repeatable sanitization for durable records and exports.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Handoff or evidence sanitization checklist | Evidence owner with runtime owner input | Before external review, audit packet creation, or workflow handoff | Required redactions, prohibited content types, summary boundaries, and reviewer responsibilities | Supports consistent sanitization; does not prove every sensitive value was detected. |
| Sanitized handoff sample | Evidence owner or workflow owner | During workflow handoff and review sampling | Useful task state, decisions, file references, and next steps without raw secrets, hidden instructions, or unnecessary private payloads | Demonstrates selected examples; does not prove all handoffs are safe. |
| Evidence export review log | Evidence or audit owner | Before sharing evidence internally for review and after export process changes | Export scope, reviewer, redaction outcome, withheld material, and rationale for included context | Supports export accountability; does not prove external sharing is legally sufficient. |
AWS2-CTX-L3-001: Instruction-Boundary And Context-Poisoning Tests Level 3
Requirement summary
Test instruction-boundary and context-poisoning scenarios for high-impact workflows, including untrusted documents, retrieved content, tool outputs, memory interactions, skill instructions, external data, and poisoned retrieval records.
Why it exists
Documented rules are not enough for high-impact workflows. The organization needs to test whether the agent resists realistic context attacks, such as indirect prompt injection in a document, poisoned retrieval results, malicious tool output, stale memory, or instructions hidden in external data.
Why this level
This belongs at Level 3 because it adds stronger assurance through testing. It is more demanding than documenting sources and controls, and it should focus on workflows where context failure could cause significant harm.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Instruction-boundary test summary | Evidence or audit owner with runtime owner input | Before high-impact production use and during recurring validation | Test cases, expected behavior, actual behavior, findings, remediation, and retest status | Tests selected scenarios; does not prove prompt-injection immunity. |
| Retrieval-poisoning or context-poisoning test | Evidence or audit owner with retrieval owner input | Before using retrieval for high-impact workflows and after retrieval changes | Poisoned document or record scenario, retrieval path, policy outcome, finding, and remediation | Supports selected retrieval-risk review; does not prove all corpus poisoning is prevented. |
| Tool-output poisoning test | Evidence or audit owner with tool owner input | During high-impact workflow validation | Whether malicious or misleading tool output can override instructions, approvals, or boundaries | Tests selected tool paths; does not prove every tool output is trustworthy. |
AWS2-CTX-L3-002: Material Context Change Records Level 3
Requirement summary
Retain reviewable records of material memory, retrieval, context, or instruction changes that can influence high-impact workflows, including actor, source, timestamp, rationale, and review status where practical.
Why it exists
High-impact workflows can change because a memory was edited, a retrieval corpus was updated, a project instruction changed, a new external data source was added, or a handoff became canonical context. Reviewers need to know what changed, who or what changed it, why, and whether it was reviewed.
Why this level
This belongs at Level 3 because stronger assurance requires durable, reviewable history for material context changes, not only current-state configuration.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Durable context change log | Runtime platform owner or evidence owner | During operation and before high-impact review | Actor or runtime, timestamp, source, context object, rationale, review status, and affected workflow where practical | Supports change traceability; does not prove the changed content is safe. |
| Retrieval corpus change record | Retrieval or knowledge-base owner | When retrieval sources are added, removed, reindexed, or materially changed | Source, change type, affected corpus, owner, review status, and rollback or correction path | Supports retrieval-change review; does not prove retrieved answers are correct. |
| Instruction-source review record | Governance owner with runtime owner input | When project, system, skill, policy, or workflow instructions materially change | Changed instruction source, reason, approver or reviewer, affected workflows, and effective date | Supports instruction-change accountability; does not prove the model will always follow the changed instruction. |
AWS2-CTX-L3-003: High-Risk Workflow Context Isolation Level 3
Requirement summary
Isolate high-risk workflows from lower-trust memory, retrieval corpora, or shared context unless the lower-trust source is explicitly approved for the workflow, and provide a clean context mode or equivalent boundary for high-impact action review where practical.
Why it exists
Some workflows should not inherit messy context. A high-impact review can be distorted by stale memory, unrelated chat history, broad retrieval corpora, external pages, or shared context from another matter. Clean context makes it easier to review the decision path and reduces the chance that lower-trust state affects a sensitive action.
Why this level
This belongs at Level 3 because it asks for stronger separation around high-risk workflows. It may require runtime features, operating procedures, or review discipline beyond ordinary production controls.
Evidence examples
| Evidence | Likely owner/provider | When collected | What it should show | Claim limit |
|---|---|---|---|---|
| Clean-context mode configuration or procedure | Runtime platform owner with workflow owner input | Before high-risk workflow use and after runtime changes | How memory, retrieval, chat history, external content, and shared context are limited or reset for high-impact review | Supports isolation review; does not prove all hidden context is absent. |
| Approved context-source list for high-risk workflow | Governance owner with runtime and workflow owner input | Before workflow approval and during periodic review | Which memory stores, retrieval corpora, documents, external sources, or handoffs are approved for the workflow | Supports source approval; does not prove approved sources are accurate or safe. |
| Context-isolation test result | Evidence or audit owner | Before high-impact production use and during recurring validation | Whether lower-trust memory, retrieval records, or unrelated shared context can influence the high-risk workflow | Tests selected isolation paths; does not prove all cross-context leakage is impossible. |
External Mapping Notes
The completed crosswalk treats AWS2-CTX as a candidate-control family shaped by instruction hierarchy, memory and vector-store security, RAG and data-flow threat modeling, prompt injection, context poisoning, tool-output poisoning, privacy, information integrity, and goal-drift signals.
Relevant source signals include:
- EU AI Act official sources: prohibited-practice, workplace-use, and disclosure signals inform boundary tests and prohibited-use records, but do not make
AWS2-CTXa legal-compliance control. - OWASP AISVS: memory, vector, and autonomous orchestration signals support testable context-handling expectations, while the public AISVS status remains early and not settled certification language.
- CSA MAESTRO: data poisoning, RAG risks, tampering, and exfiltration support threat-modeling and context-risk review.
- NIST AI 600-1: privacy, information-integrity, confabulation, and component risk signals support context-source inventories and retrieval validation, but enforcement evidence must come from the actual runtime and workspace.
- ISO/IEC 23894: context-customized AI risk-management guidance supports risk assessment and treatment notes, based only on public high-level source descriptions available in the current crosswalk.
- Five Eyes agentic AI guidance: indirect prompt injection, memory interaction, and goal-drift signals support prompt-injection tests, memory interaction logs, and adoption gates.
- MITRE ATLAS: prompt injection, context poisoning, RAG poisoning, and tool data poisoning support scenario design for validation and red-team work.
These mappings are informative. They support evidence for selected candidate controls and scenario design, but they do not prove prompt-injection immunity, legal compliance, external-framework conformance, or complete model robustness.
Formal Standard Link
Use this guide with the formal AWS2-CTX candidate requirements. If the guide and the standard draft disagree, the standard draft controls.