benchmark 2025

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Karolina Korgul ¹, Yushi Yang ¹, Arkadiusz Drohomirecki ², Piotr Błaszczyk ³, Will Howard ³, Lukas Aichberger ^1,4, Chris Russell ¹, Philip Torr ¹, Adam Mahdi ¹, Adel Bibi ^1,2

¹ University of Oxford

² SoftServe

³ Independent

⁴ Johannes Kepler University Linz

0 citations · 20 references · arXiv

Published on arXiv

2512.23128

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Frontier LLM web agents fall for prompt injection in 25% of tasks on average (13% for GPT-5, 43% for DeepSeek-R1), with small interface or contextual changes frequently doubling attack success rates.

TRAP (Task-Redirecting Agent Persuasion)

Novel technique introduced

Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.

Key Contributions

TRAP benchmark with 630 modular prompt injections across 18 realistic tasks and 6 website environments, measuring agent susceptibility to social-engineering-driven indirect prompt injection
Five-dimensional modular attack space combining interface manipulation and persuasion modules (including Cialdini's principles), enabling controlled ablation of factors that drive injection success
Empirical evaluation across six frontier LLM agents (GPT-5 to DeepSeek-R1), revealing 13–43% susceptibility rates and systemic, psychologically driven vulnerabilities

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

TRAP benchmark (18 tasks × 6 website clones × 630 injections)

Applications

web agentsemail managementprofessional networkingautonomous browsing

Read PDF arXiv DOI

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

PEAR: Planner-Executor Agent Robustness Benchmark

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B

When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models