benchmark 2025

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Karolina Korgul 1, Yushi Yang 1, Arkadiusz Drohomirecki 2, Piotr Błaszczyk 3, Will Howard 3, Lukas Aichberger 1,4, Chris Russell 1, Philip Torr 1, Adam Mahdi 1, Adel Bibi 1,2

0 citations · 20 references · arXiv

α

Published on arXiv

2512.23128

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Frontier LLM web agents fall for prompt injection in 25% of tasks on average (13% for GPT-5, 43% for DeepSeek-R1), with small interface or contextual changes frequently doubling attack success rates.

TRAP (Task-Redirecting Agent Persuasion)

Novel technique introduced


Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.


Key Contributions

  • TRAP benchmark with 630 modular prompt injections across 18 realistic tasks and 6 website environments, measuring agent susceptibility to social-engineering-driven indirect prompt injection
  • Five-dimensional modular attack space combining interface manipulation and persuasion modules (including Cialdini's principles), enabling controlled ablation of factors that drive injection success
  • Empirical evaluation across six frontier LLM agents (GPT-5 to DeepSeek-R1), revealing 13–43% susceptibility rates and systemic, psychologically driven vulnerabilities

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
TRAP benchmark (18 tasks × 6 website clones × 630 injections)
Applications
web agentsemail managementprofessional networkingautonomous browsing