151 test cases across 17 categories: DLP evasion, prompt injection, SSRF, tool poisoning, encoding chains, shell obfuscation, and A2A scanning. The same corpus Pipelock tests itself against before every release.

This tests the security tool, not the agent. View the scoring methodology.

How to run the gauntlet yourself
git clone https://github.com/luckyPipewrench/agent-egress-bench.git
cd agent-egress-bench/runner
go build -o aeb-gauntlet .
./aeb-gauntlet --cases ../cases --profile my-profile.json --output results.json \
  --submit https://pipelab.org/api/results

See the adoption guide for building a runner. To submit results, open a discussion.