THREAT MODELARCHITECTURE-LEVELAUDITABLE

If your AI can act, your AI is a security boundary.

This threat model documents the primary failure modes of agentic systems and the exact mitigations required by the ProofGate Standard. The goal is simple: turn AI incidents into impossible states.

Read the Standard Jump to Spec

MEMETIC SUMMARY

Agents fail when authority is implicit

If tools are reachable and constraints are not explicit, untrusted input becomes real actions.

ProofGate makes authority explicit

Intent → policy gate → signed approval → signed receipt → append-only audit. Always.

THREAT INDEX

The failure modes that actually happen in production

These are not hypothetical. They are predictable outcomes of untrusted input, probabilistic planning, and over-scoped tool access.

Prompt Injection

Untrusted content becomes tool instructions.

Tool Scope Escalation

Over-broad permissions turn mistakes into incidents.

Cross-Tenant Leakage

Context or tools bleed across users or orgs.

TOCTOU Drift

Policy changes after approval but before execution.

Replay & Forgery

Approvals/requests get reused or spoofed.

Silent Mutations

Actions happen without immutable receipts/audit.

THREATS → MITIGATIONS

Each threat maps to enforceable requirements

ProofGate mitigations are not “best practices.” They are explicit gates and proofs you can audit.

THREAT

Prompt Injection (Untrusted Input → Tool Calls)

Link

WHY IT HAPPENS

LLMs treat text as instructions. Emails, tickets, docs, and web pages can contain adversarial strings that steer the model into executing unintended actions.

IMPACT

Unauthorized external emails/messages
Data exfiltration via summaries or tool outputs
Unintended edits/deletes in connected systems
Credential/secret leakage if exposed to the model

DETECTION SIGNALS

Tool calls initiated from content sources (email/docs/web)
Unexpected external recipients/URLs
Model output contains “ignore previous instructions” patterns
Actions reference content not requested by the user

REQUIRED MITIGATIONS (PROOFGATE STANDARD)

Intent Envelope required

Model output must be structured intent; free-form text cannot be executed.

Policy gate before tools

Deny/require approval/execute decisions are deterministic and external to the model.

Allowlist domains + caps

Recipient domains, link domains, amounts, and action scopes are explicitly bounded.

Signed receipts + audit

Every decision and execution produces signed receipts and append-only audit evidence.

THREAT

Tool Scope Escalation (Too Much Power)

Link

WHY IT HAPPENS

Many agent systems wire tools with broad permissions to reduce friction. This makes every model error and every adversarial prompt a high-impact event.

IMPACT

Account takeover within SaaS tools
Mass deletes/edits
Unauthorized approvals or payments
Privilege escalation across internal systems

DETECTION SIGNALS

Tools configured with admin scopes by default
Tool router lacks per-action allowlists
Model can access secrets directly
No human approval thresholds

REQUIRED MITIGATIONS (PROOFGATE STANDARD)

No secrets in models

Credentials must never be passed to the model. Tools behind a router only.

Least privilege router

Scopes and permissions must be minimized per action. Deny by default.

Approval thresholds

High-risk actions require signed approvals; approvals expire and are one-time.

Receipts for causality

Signed receipts allow post-incident attribution and rollback analysis.

THREAT

Cross-Tenant Leakage (Context & Tools Bleed)

Link

WHY IT HAPPENS

Stateful assistants can accidentally mix memory, retrieval, or tool results across users/orgs when boundaries are not explicit and enforced.

IMPACT

Data leakage between customers
Accidental disclosure of internal documents
Wrong customer receives communications
Regulatory and contractual exposure

DETECTION SIGNALS

Shared caches without tenant keying
Retrieval results include other tenant identifiers
Tools invoked with shared credentials
Logs show cross-tenant IDs in responses

REQUIRED MITIGATIONS (PROOFGATE STANDARD)

Tenant-bound intents

Intent Envelopes must include tenant identity and be validated for isolation.

Router isolation

Tool credentials and access must be tenant-scoped; no shared tool authority.

Audit by tenant

Append-only audit must record tenant context and enable forensic partitioning.

Receipt hashing

Hash and sign intent/execution so cross-tenant mixups are provable.

THREAT

TOCTOU Drift (Time-of-Check / Time-of-Use)

Link

WHY IT HAPPENS

A decision can be evaluated under one policy state, then executed later under a different policy state if the system does not re-check at execution time.

IMPACT

Previously-allowed action executes after policy tightened
Approval token used after conditions change
Race conditions enable bypass
Inconsistent enforcement across distributed workers

DETECTION SIGNALS

Approvals not bound to intent hash
Approvals used minutes/hours later without re-check
Policy changes without invalidation mechanism
Async execution without deterministic guard

REQUIRED MITIGATIONS (PROOFGATE STANDARD)

Re-check at approval/execution

System must re-run the gate at time of execution with current policy state.

Expiring approvals

Approval tokens expire quickly and are one-time use.

Intent-hash binding

Approvals must be tied to the exact canonical intent hash.

Signed receipts

Receipt includes timestamps and hashes to reconstruct state transitions.

THREAT

Replay & Forgery (Approvals / Actions)

Link

WHY IT HAPPENS

If approvals are guessable or unsigned, attackers can forge them. If approvals are reusable, attackers can replay them.

IMPACT

Unauthorized execution using forged approvals
Repeated execution using replayed approvals
Bypassing human-in-the-loop controls
Undetectable abuse if logging is weak

DETECTION SIGNALS

Approval tokens are predictable (e.g., appr_intentId)
Approvals lack expiry
Approvals not one-time use
Approvals not tied to intent hash

REQUIRED MITIGATIONS (PROOFGATE STANDARD)

Signed approval tokens

Approvals must be cryptographically signed and verifiable.

Expiry + one-time use

Tokens expire and are invalidated after execution.

Intent hash binding

Token payload includes the intentHash and is verified against stored pending state.

Audit evidence

Append-only log records approve→execute chain; receipts prove sequence.

THREAT

Silent Mutations (No Receipts / No Audit)

Link

WHY IT HAPPENS

Agent systems often act through external tools without producing immutable proof. When something goes wrong, root cause becomes subjective.

IMPACT

No attribution of who approved what
No deterministic replay of decision chain
Inability to prove tampering vs bug vs model error
Operational chaos in incident response

DETECTION SIGNALS

Actions lack IDs and hashes
Logs are mutable or missing
No signed artifacts
Only UI logs exist (not durable)

REQUIRED MITIGATIONS (PROOFGATE STANDARD)

Signed decision receipts

Every decision emits a signed receipt with intent hash.

Signed execution receipts

Every execution emits a signed receipt with execution hash.

Append-only audit

Audit events are written as immutable JSONL lines (memory).

Deterministic router

Only the router performs side effects; receipts reference the router result.

Back to Standard Next: Spec (Schemas) Reference Docs