The short version

Guardrails check the model’s intent before it acts. They run inside the inference pipeline.

An agent firewall checks what goes over the wire after the model acts. It runs at the network layer.

Guardrails catch bad reasoning. Agent firewalls catch bad traffic. They fail in different ways. Use both.

The trust boundary problem

Here’s why this matters: guardrails and the model share a trust boundary.

Guardrails like NeMo Guardrails, Guardrails AI, and LlamaFirewall run in the same process as the model or in the same inference pipeline. They use the same text processing, the same tokenization, and sometimes the same model architecture to detect attacks.

A prompt injection that’s good enough to fool the model has a decent chance of fooling the guardrail too. They’re processing the same input with similar techniques.

An agent firewall operates outside that trust boundary. It sees raw HTTP requests and MCP messages. Regex-based DLP doesn’t care what the model was thinking. It just checks whether the outbound request contains an API key. Pattern matching for injection doesn’t need to understand context. It just checks whether the response contains “ignore previous instructions.”

Different layers, different techniques, different failure modes. That’s defense in depth.

How guardrails work

Guardrails intercept model interactions before they reach external systems:

User Input → Guardrail (check input) → Model → Guardrail (check output) → Action

NeMo Guardrails (NVIDIA): Define conversation rails in a custom language (Colang). The model’s outputs are checked against allowed patterns before being executed.

Guardrails AI: Define validators using Pydantic models. Model outputs are validated against schemas and can be corrected automatically.

LlamaFirewall (Meta): Three-scanner pipeline. PromptGuard classifies inputs, AlignmentCheck audits chain-of-thought, CodeShield scans generated code.

All three are Python libraries. They hook into the model pipeline. They’re effective when you control the inference chain.

How an agent firewall works

An agent firewall intercepts network traffic after the model has decided to act:

Model decides to act → Agent sends request → Agent Firewall (scan request + response) → External system

It doesn’t know or care what the model was thinking. It scans:

What guardrails catch that firewalls don’t

Unsafe reasoning. If the model is thinking “I should read the SSH key file,” guardrails can catch that intent before the agent writes the code. An agent firewall only sees the result after execution.

Bad code generation. Tools like CodeShield (in LlamaFirewall) can scan generated code for known vulnerabilities before it runs.

Off-topic behavior. NeMo Guardrails can constrain the model to stay on-topic and follow conversational rails. Agent firewalls don’t care about conversation flow.

Hallucination filtering. Some guardrail frameworks include factuality checks. Firewalls don’t validate content accuracy.

What firewalls catch that guardrails don’t

Credential leaks. When an agent’s outbound request contains an AWS key encoded in base64, DLP catches it. Guardrails don’t scan outbound HTTP.

MCP tool poisoning. A malicious MCP server can change its tool descriptions mid-session (rug-pull) to instruct the agent to exfiltrate data. An agent firewall fingerprints descriptions and detects changes. Guardrails don’t monitor MCP tool descriptions.

SSRF. An injection could tell the agent to request http://169.254.169.254/latest/meta-data/ to steal cloud credentials. An agent firewall blocks private IP requests. Guardrails don’t operate at the network layer.

Post-bypass traffic. If an injection gets past the guardrail (and some do), the resulting malicious request still has to go through the agent firewall. Two independent chances to catch the attack.

Closed-pipeline agents. Claude Code, Cursor, GitHub Copilot, and most commercial agents use hosted models. You can’t insert guardrails into their inference chain. But you can route their traffic through a proxy.

The bypass problem

Guardrails have a known bypass problem. Research has demonstrated:

This doesn’t mean guardrails are useless. They catch a lot of attacks. But they operate in the same trust domain as the model, so a sufficiently clever attack can fool both.

Agent firewalls have a different bypass surface. Pattern-matching misses novel phrasings. DLP regex misses encrypted payloads. But these failures are independent of the guardrail’s failures. A prompt injection that fools the model and the guardrail might still trigger DLP when the resulting request contains a recognizable credential pattern.

Side-by-side

GuardrailsAgent Firewall
Where it runsIn the model pipelineAt the network boundary
What it inspectsModel inputs/outputs, reasoningHTTP requests, MCP messages
Credential scanningNoYes (DLP)
Injection detectionModel-based classificationPattern matching
MCP securityNoYes
SSRF protectionNoYes
Works with closed agentsNo (need pipeline access)Yes (proxy-based)
Can be bypassed by injectionYes (same trust boundary)Different failure modes

How to use both

The best setup puts guardrails inside the pipeline and a firewall outside it:

User Input → Guardrail → Model → Guardrail → Agent → Agent Firewall → External System

Guardrails catch bad intent. The firewall catches bad traffic. If one misses something, the other might not.

Practical setup for custom Python agents:

  1. Use LlamaFirewall or NeMo Guardrails in your agent code
  2. Run Pipelock as the proxy
  3. Set HTTPS_PROXY=http://127.0.0.1:8888 for the agent process
  4. Wrap MCP servers with pipelock mcp proxy

For commercial agents (Claude Code, Cursor): Guardrails aren’t an option since you can’t modify the pipeline. Use Pipelock at the network layer. It’s your only enforcement point.

How Pipelock fits

Pipelock is an open-source agent firewall. It handles the network layer: DLP, injection detection, SSRF, MCP scanning, and rate limiting. It’s the second layer of defense that catches what guardrails miss.

It doesn’t replace guardrails. If you can deploy guardrails, do it. Pipelock handles the traffic that guardrails can’t see.

Further reading