The Prompt Injection Problem
Prompt injection is the attack where untrusted content manipulates an AI agent into taking unintended actions. Simon Willison describes the “Lethal Trifecta” of conditions that make this dangerous:- Untrusted content in context — The agent processes data it didn’t generate
- Access to tools — The agent can take actions in the world
- Trusted user intent — The system assumes the agent acts on behalf of the user
Char’s Defense Model
Char addresses each condition architecturally:Explicit Context Boundaries
The agent sees only:| Source | Content |
|---|---|
| Conversation | User messages in the widget |
| Skills | SKILL.md files you’ve registered |
| Tool schemas | Names, descriptions, input schemas |
| Tool outputs | Results of tool invocations |
- Raw page DOM
- Cookies or localStorage
- Hidden fields or private state
- Other users’ conversations
- Network requests or responses
Policy-Mediated Tool Access
All tool invocations flow through the Tool Hub, which enforces policy: The agent cannot invoke tools directly—every call is intercepted and evaluated.Tool Classification
Tools are categorized by risk level:| Classification | Description | Example |
|---|---|---|
| read | Retrieves data without modification | getCustomer() |
| write | Modifies state | updateLead() |
| exfil | Sends data externally | sendEmail() |
- read — Automatic approval
- write — Requires user confirmation
- exfil — Requires explicit approval each time
Role-Based Visibility
Tools can be scoped by user role:Why Architectural Defense Matters
Behavioral defenses (“please don’t do bad things”) are unreliable:| Defense Type | Approach | Limitation |
|---|---|---|
| Prompt engineering | System prompts instructing safe behavior | Can be overridden by injected content |
| Output filtering | Checking agent responses for malicious patterns | Attackers can encode instructions |
| Input sanitization | Filtering dangerous patterns from input | Incomplete coverage |
| Defense Type | Approach | Property |
|---|---|---|
| Context isolation | Agent only sees explicit inputs | No unexpected data exposure |
| Policy enforcement | All tool calls pass through Hub | No direct tool access |
| Classification | Tools categorized by risk | Graduated approval requirements |
| Role scoping | Tools filtered by user role | Reduced attack surface |
The Tool-Mediated Access Pattern
A key design principle: agents access application data only through explicitly registered tools. This inverts the typical pattern where agents are given broad read access and trusted to behave appropriately. Why this matters for security:- Explicit boundaries — You decide exactly what data the agent can access
- Documented interfaces — Tool schemas serve as contracts, making the attack surface auditable
- Minimal privilege — Each tool exposes only what’s needed for its function
For implementation details, see the WebMCP Tools guide.
Kill Switch
In Tier 2 deployments, administrators can instantly disable:- Individual tools — Disable a specific tool across all users
- Tool providers — Disable all tools from a domain
- Agent access — Disable agent functionality entirely

