It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
Karolina Korgul 1, Yushi Yang 1, Arkadiusz Drohomirecki 2, Piotr Błaszczyk 3, Will Howard 3, Lukas Aichberger 1,4, Chris Russell 1, Philip Torr 1, Adam Mahdi 1, Adel Bibi 1,2
Published on arXiv
2512.23128
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Frontier LLM web agents fall for prompt injection in 25% of tasks on average (13% for GPT-5, 43% for DeepSeek-R1), with small interface or contextual changes frequently doubling attack success rates.
TRAP (Task-Redirecting Agent Persuasion)
Novel technique introduced
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.
Key Contributions
- TRAP benchmark with 630 modular prompt injections across 18 realistic tasks and 6 website environments, measuring agent susceptibility to social-engineering-driven indirect prompt injection
- Five-dimensional modular attack space combining interface manipulation and persuasion modules (including Cialdini's principles), enabling controlled ablation of factors that drive injection success
- Empirical evaluation across six frontier LLM agents (GPT-5 to DeepSeek-R1), revealing 13–43% susceptibility rates and systemic, psychologically driven vulnerabilities