THREAT MODELARCHITECTURE-LEVELAUDITABLE
If your AI can act, your AI is a security boundary.
This threat model documents the primary failure modes of agentic systems and the exact mitigations required by the ProofGate Standard. The goal is simple: turn AI incidents into impossible states.
MEMETIC SUMMARY
Agents fail when authority is implicit
If tools are reachable and constraints are not explicit, untrusted input becomes real actions.
ProofGate makes authority explicit
Intent → policy gate → signed approval → signed receipt → append-only audit. Always.
THREAT INDEX
The failure modes that actually happen in production
These are not hypothetical. They are predictable outcomes of untrusted input, probabilistic planning, and over-scoped tool access.
Prompt Injection
Untrusted content becomes tool instructions.
Tool Scope Escalation
Over-broad permissions turn mistakes into incidents.
Cross-Tenant Leakage
Context or tools bleed across users or orgs.
TOCTOU Drift
Policy changes after approval but before execution.
Replay & Forgery
Approvals/requests get reused or spoofed.
Silent Mutations
Actions happen without immutable receipts/audit.
THREATS → MITIGATIONS
Each threat maps to enforceable requirements
ProofGate mitigations are not “best practices.” They are explicit gates and proofs you can audit.
THREAT
Prompt Injection (Untrusted Input → Tool Calls)
WHY IT HAPPENS
LLMs treat text as instructions. Emails, tickets, docs, and web pages can contain adversarial strings that steer the model into executing unintended actions.
IMPACT
- Unauthorized external emails/messages
- Data exfiltration via summaries or tool outputs
- Unintended edits/deletes in connected systems
- Credential/secret leakage if exposed to the model
DETECTION SIGNALS
- Tool calls initiated from content sources (email/docs/web)
- Unexpected external recipients/URLs
- Model output contains “ignore previous instructions” patterns
- Actions reference content not requested by the user
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Intent Envelope required
Model output must be structured intent; free-form text cannot be executed.
Policy gate before tools
Deny/require approval/execute decisions are deterministic and external to the model.
Allowlist domains + caps
Recipient domains, link domains, amounts, and action scopes are explicitly bounded.
Signed receipts + audit
Every decision and execution produces signed receipts and append-only audit evidence.
THREAT
Tool Scope Escalation (Too Much Power)
WHY IT HAPPENS
Many agent systems wire tools with broad permissions to reduce friction. This makes every model error and every adversarial prompt a high-impact event.
IMPACT
- Account takeover within SaaS tools
- Mass deletes/edits
- Unauthorized approvals or payments
- Privilege escalation across internal systems
DETECTION SIGNALS
- Tools configured with admin scopes by default
- Tool router lacks per-action allowlists
- Model can access secrets directly
- No human approval thresholds
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
No secrets in models
Credentials must never be passed to the model. Tools behind a router only.
Least privilege router
Scopes and permissions must be minimized per action. Deny by default.
Approval thresholds
High-risk actions require signed approvals; approvals expire and are one-time.
Receipts for causality
Signed receipts allow post-incident attribution and rollback analysis.
THREAT
Cross-Tenant Leakage (Context & Tools Bleed)
WHY IT HAPPENS
Stateful assistants can accidentally mix memory, retrieval, or tool results across users/orgs when boundaries are not explicit and enforced.
IMPACT
- Data leakage between customers
- Accidental disclosure of internal documents
- Wrong customer receives communications
- Regulatory and contractual exposure
DETECTION SIGNALS
- Shared caches without tenant keying
- Retrieval results include other tenant identifiers
- Tools invoked with shared credentials
- Logs show cross-tenant IDs in responses
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Tenant-bound intents
Intent Envelopes must include tenant identity and be validated for isolation.
Router isolation
Tool credentials and access must be tenant-scoped; no shared tool authority.
Audit by tenant
Append-only audit must record tenant context and enable forensic partitioning.
Receipt hashing
Hash and sign intent/execution so cross-tenant mixups are provable.
THREAT
TOCTOU Drift (Time-of-Check / Time-of-Use)
WHY IT HAPPENS
A decision can be evaluated under one policy state, then executed later under a different policy state if the system does not re-check at execution time.
IMPACT
- Previously-allowed action executes after policy tightened
- Approval token used after conditions change
- Race conditions enable bypass
- Inconsistent enforcement across distributed workers
DETECTION SIGNALS
- Approvals not bound to intent hash
- Approvals used minutes/hours later without re-check
- Policy changes without invalidation mechanism
- Async execution without deterministic guard
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Re-check at approval/execution
System must re-run the gate at time of execution with current policy state.
Expiring approvals
Approval tokens expire quickly and are one-time use.
Intent-hash binding
Approvals must be tied to the exact canonical intent hash.
Signed receipts
Receipt includes timestamps and hashes to reconstruct state transitions.
THREAT
Replay & Forgery (Approvals / Actions)
WHY IT HAPPENS
If approvals are guessable or unsigned, attackers can forge them. If approvals are reusable, attackers can replay them.
IMPACT
- Unauthorized execution using forged approvals
- Repeated execution using replayed approvals
- Bypassing human-in-the-loop controls
- Undetectable abuse if logging is weak
DETECTION SIGNALS
- Approval tokens are predictable (e.g., appr_intentId)
- Approvals lack expiry
- Approvals not one-time use
- Approvals not tied to intent hash
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Signed approval tokens
Approvals must be cryptographically signed and verifiable.
Expiry + one-time use
Tokens expire and are invalidated after execution.
Intent hash binding
Token payload includes the intentHash and is verified against stored pending state.
Audit evidence
Append-only log records approve→execute chain; receipts prove sequence.
THREAT
Silent Mutations (No Receipts / No Audit)
WHY IT HAPPENS
Agent systems often act through external tools without producing immutable proof. When something goes wrong, root cause becomes subjective.
IMPACT
- No attribution of who approved what
- No deterministic replay of decision chain
- Inability to prove tampering vs bug vs model error
- Operational chaos in incident response
DETECTION SIGNALS
- Actions lack IDs and hashes
- Logs are mutable or missing
- No signed artifacts
- Only UI logs exist (not durable)
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Signed decision receipts
Every decision emits a signed receipt with intent hash.
Signed execution receipts
Every execution emits a signed receipt with execution hash.
Append-only audit
Audit events are written as immutable JSONL lines (memory).
Deterministic router
Only the router performs side effects; receipts reference the router result.