THREAT MODELARCHITECTURE-LEVELAUDITABLE

If your AI can act, your AI is a security boundary.

This threat model documents the primary failure modes of agentic systems and the exact mitigations required by the ProofGate Standard. The goal is simple: turn AI incidents into impossible states.

MEMETIC SUMMARY

Agents fail when authority is implicit

If tools are reachable and constraints are not explicit, untrusted input becomes real actions.

ProofGate makes authority explicit

Intent → policy gate → signed approval → signed receipt → append-only audit. Always.

THREATS → MITIGATIONS

Each threat maps to enforceable requirements

ProofGate mitigations are not “best practices.” They are explicit gates and proofs you can audit.

THREAT

Prompt Injection (Untrusted Input → Tool Calls)

Link
WHY IT HAPPENS
LLMs treat text as instructions. Emails, tickets, docs, and web pages can contain adversarial strings that steer the model into executing unintended actions.
IMPACT
  • Unauthorized external emails/messages
  • Data exfiltration via summaries or tool outputs
  • Unintended edits/deletes in connected systems
  • Credential/secret leakage if exposed to the model
DETECTION SIGNALS
  • Tool calls initiated from content sources (email/docs/web)
  • Unexpected external recipients/URLs
  • Model output contains “ignore previous instructions” patterns
  • Actions reference content not requested by the user
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Intent Envelope required
Model output must be structured intent; free-form text cannot be executed.
Policy gate before tools
Deny/require approval/execute decisions are deterministic and external to the model.
Allowlist domains + caps
Recipient domains, link domains, amounts, and action scopes are explicitly bounded.
Signed receipts + audit
Every decision and execution produces signed receipts and append-only audit evidence.
THREAT

Tool Scope Escalation (Too Much Power)

Link
WHY IT HAPPENS
Many agent systems wire tools with broad permissions to reduce friction. This makes every model error and every adversarial prompt a high-impact event.
IMPACT
  • Account takeover within SaaS tools
  • Mass deletes/edits
  • Unauthorized approvals or payments
  • Privilege escalation across internal systems
DETECTION SIGNALS
  • Tools configured with admin scopes by default
  • Tool router lacks per-action allowlists
  • Model can access secrets directly
  • No human approval thresholds
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
No secrets in models
Credentials must never be passed to the model. Tools behind a router only.
Least privilege router
Scopes and permissions must be minimized per action. Deny by default.
Approval thresholds
High-risk actions require signed approvals; approvals expire and are one-time.
Receipts for causality
Signed receipts allow post-incident attribution and rollback analysis.
THREAT

Cross-Tenant Leakage (Context & Tools Bleed)

Link
WHY IT HAPPENS
Stateful assistants can accidentally mix memory, retrieval, or tool results across users/orgs when boundaries are not explicit and enforced.
IMPACT
  • Data leakage between customers
  • Accidental disclosure of internal documents
  • Wrong customer receives communications
  • Regulatory and contractual exposure
DETECTION SIGNALS
  • Shared caches without tenant keying
  • Retrieval results include other tenant identifiers
  • Tools invoked with shared credentials
  • Logs show cross-tenant IDs in responses
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Tenant-bound intents
Intent Envelopes must include tenant identity and be validated for isolation.
Router isolation
Tool credentials and access must be tenant-scoped; no shared tool authority.
Audit by tenant
Append-only audit must record tenant context and enable forensic partitioning.
Receipt hashing
Hash and sign intent/execution so cross-tenant mixups are provable.
THREAT

TOCTOU Drift (Time-of-Check / Time-of-Use)

Link
WHY IT HAPPENS
A decision can be evaluated under one policy state, then executed later under a different policy state if the system does not re-check at execution time.
IMPACT
  • Previously-allowed action executes after policy tightened
  • Approval token used after conditions change
  • Race conditions enable bypass
  • Inconsistent enforcement across distributed workers
DETECTION SIGNALS
  • Approvals not bound to intent hash
  • Approvals used minutes/hours later without re-check
  • Policy changes without invalidation mechanism
  • Async execution without deterministic guard
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Re-check at approval/execution
System must re-run the gate at time of execution with current policy state.
Expiring approvals
Approval tokens expire quickly and are one-time use.
Intent-hash binding
Approvals must be tied to the exact canonical intent hash.
Signed receipts
Receipt includes timestamps and hashes to reconstruct state transitions.
THREAT

Replay & Forgery (Approvals / Actions)

Link
WHY IT HAPPENS
If approvals are guessable or unsigned, attackers can forge them. If approvals are reusable, attackers can replay them.
IMPACT
  • Unauthorized execution using forged approvals
  • Repeated execution using replayed approvals
  • Bypassing human-in-the-loop controls
  • Undetectable abuse if logging is weak
DETECTION SIGNALS
  • Approval tokens are predictable (e.g., appr_intentId)
  • Approvals lack expiry
  • Approvals not one-time use
  • Approvals not tied to intent hash
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Signed approval tokens
Approvals must be cryptographically signed and verifiable.
Expiry + one-time use
Tokens expire and are invalidated after execution.
Intent hash binding
Token payload includes the intentHash and is verified against stored pending state.
Audit evidence
Append-only log records approve→execute chain; receipts prove sequence.
THREAT

Silent Mutations (No Receipts / No Audit)

Link
WHY IT HAPPENS
Agent systems often act through external tools without producing immutable proof. When something goes wrong, root cause becomes subjective.
IMPACT
  • No attribution of who approved what
  • No deterministic replay of decision chain
  • Inability to prove tampering vs bug vs model error
  • Operational chaos in incident response
DETECTION SIGNALS
  • Actions lack IDs and hashes
  • Logs are mutable or missing
  • No signed artifacts
  • Only UI logs exist (not durable)
REQUIRED MITIGATIONS (PROOFGATE STANDARD)
Signed decision receipts
Every decision emits a signed receipt with intent hash.
Signed execution receipts
Every execution emits a signed receipt with execution hash.
Append-only audit
Audit events are written as immutable JSONL lines (memory).
Deterministic router
Only the router performs side effects; receipts reference the router result.