AI agent security is not one tool or one layer. Your agent has API keys, file system access, and an internet connection. When something goes wrong, secrets leave through HTTP requests, MCP tool calls, or DNS queries before anyone notices. The model’s built-in safety training helps, but it’s not a security boundary. It can be bypassed with the right prompt.

Real defense requires multiple layers, each catching different attacks at different points. Most teams either have zero layers or think one tool covers everything. Neither is true.

The three security layers

There are three fundamentally different places to enforce security on AI agents. They’re not interchangeable. Each one operates at a different level of the stack, sees different data, and stops different attacks.

Layer 1: Agent-side hooks

What it is: Code that intercepts tool calls before the agent executes them. Hooks see the tool name and arguments, run checks, and return allow/deny.

Examples: Sage (Gen Digital), AgentGuard (GoPlus), Claude Code’s built-in permission system.

What it catches: Dangerous shell commands, suspicious file writes, known-bad URLs in tool arguments, package supply chain attacks.

What it misses: Anything that doesn’t go through the hook system. HTTP requests made by tools after they execute. Secrets encoded in base64 or hex. Response content returning from tools (hooks run pre-execution, not post). MCP tool descriptions that poison the agent before any tool is called.

The fundamental limit: Hooks run inside the agent’s process. A successful prompt injection can potentially disable or bypass them.

Layer 2: Inference guardrails

What it is: A classifier (usually ML-based) that analyzes prompts and completions for safety. Either embedded in the application code or called as an API.

Examples: LlamaFirewall (Meta), NeMo Guardrails (NVIDIA), Guardrails AI.

What it catches: Direct jailbreak attempts, harmful content generation, some prompt injection patterns in the model’s input and output.

What it misses: Secrets in HTTP traffic (guardrails don’t see HTTP requests). MCP tool poisoning (guardrails don’t parse MCP protocol). Encoded exfiltration (base64 secrets in URLs look like normal text to a classifier). DNS-based exfiltration. SSRF. Anything that happens at the network layer.

The fundamental limit: Guardrails analyze text for semantic safety. They don’t see network traffic. An agent that passes a guardrail check can still exfiltrate your AWS keys through a perfectly benign-looking HTTP request.

Layer 3: Egress inspection

What it is: A network proxy between the agent and the internet. All traffic flows through it. It scans HTTP requests, MCP messages, and WebSocket frames for security violations. DLP runs before DNS resolution, preventing secret exfiltration through DNS query strings.

Examples: Pipelock does content-inspecting egress proxy for AI agents (DLP, injection, SSRF, encoding evasion across HTTP, MCP, and WebSocket). GitHub’s agent workflow firewall does domain-level allowlisting without content scanning. Most other tools in this space either scan MCP only or do allowlisting only.

What it catches: Secret exfiltration (API keys, tokens, credentials in URLs, headers, POST bodies, tool arguments), prompt injection in fetched content and tool responses, SSRF (private IPs, cloud metadata, DNS rebinding), MCP tool poisoning and rug-pulls, multi-tool exfiltration chains, encoded secrets (base64, hex, URL encoding, Unicode tricks).

What it misses: Attacks that don’t touch the network (local file manipulation, in-memory reasoning corruption). Jailbreaks that change the agent’s behavior without producing suspicious traffic. That’s where hooks and guardrails complement it.

The fundamental advantage: Runs as a separate process with no shared memory. The agent can’t disable it or bypass it through prompt injection. Even if the agent is fully compromised, its traffic still passes through the proxy.

Why one layer isn’t enough

AttackHooksGuardrailsEgress Inspection
Shell command injectionCatchesMissesMisses (no shell visibility)
Secret in URL query paramMisses (no URL scanning)Misses (not text safety)Catches
Secret in POST bodyMisses (no body visibility)MissesCatches
MCP tool poisoningMisses (no MCP parsing)Misses (no MCP parsing)Catches
Prompt injection in web pageMisses (post-fetch)Catches (maybe)Catches
Base64-encoded secret in headerMissesMissesCatches (6-pass decode)
SSRF to cloud metadataMissesMissesCatches
DNS exfiltrationMissesMissesCatches
Direct jailbreak promptMisses (pre-tool)CatchesMisses (no prompt visibility)
Dangerous file writeCatchesMissesMisses (local operation)

No layer covers every row. An agent protected by all three has real defense-in-depth.

Getting started

If you have zero layers today, start with egress inspection. It covers the widest attack surface with the least integration work. Set HTTPS_PROXY to route HTTP traffic through a scanning proxy, and wrap MCP servers through the proxy for tool call inspection.

# Install
brew install luckyPipewrench/tap/pipelock

# Set up Claude Code (hooks + MCP proxy)
pipelock claude setup

# Or just proxy HTTP traffic for any agent
export HTTPS_PROXY=http://127.0.0.1:8888
pipelock run

Then add hooks and guardrails as your threat model demands. The three layers are complementary, not competing.

Further reading

Ready to validate your deployment?