Adam Mahdi

h-index: 3 30 citations 9 papers (total)

Papers in Database (2)

attack arXiv Oct 20, 2025 · Oct 2025

Agentic Reinforcement Learning for Search is Unsafe

Yushi Yang, Shreyansh Padarha, Andrew Lee et al. · University of Oxford · Harvard University

Discovers two simple prompt-level attacks that bypass safety in RL-trained LLM search agents by triggering search before refusal tokens

Prompt Injection Excessive Agency nlpreinforcement-learning
1 citations PDF
benchmark arXiv Dec 29, 2025 · Dec 2025

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki et al. · University of Oxford · SoftServe +2 more

Benchmarks indirect prompt injection susceptibility of six frontier LLM agents on realistic web tasks using persuasion techniques

Prompt Injection Excessive Agency nlp
PDF