Yarin Gal

attack arXiv Oct 8, 2025 · Oct 2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Alexandra Souly, Javier Rando, Ed Chapman et al. · UK AI Security Institute · Anthropic +3 more

Shows LLM backdoor poisoning needs only ~250 documents regardless of model size, making attacks more practical at scale

Model Poisoning Data Poisoning Attack Training Data Poisoning nlp

32 citations 2 influentialPDF

attack arXiv Oct 2, 2025 · Oct 2025

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh, Ruomei Yan, Jialin Yu et al. · University of Oxford · Microsoft

Adversarially crafts tool names and descriptions to bias LLM agents into selecting attacker-controlled tools over fair alternatives

Insecure Plugin Design Prompt Injection nlp

6 citations 1 influentialPDF

tool arXiv Oct 2, 2025 · Oct 2025

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Chengquan Guo, Chulin Xie, Yu Yang et al. · University of Chicago · University of Illinois Urbana-Champaign +5 more

Automated red-teaming agent that adaptively combines jailbreak tools to uncover safety vulnerabilities in LLM-based code agents

Prompt Injection nlp

4 citations PDF

attack arXiv Feb 13, 2026 · 7w ago

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay J Culligan, Yarin Gal et al. · University of Oxford · Toyota Motor Europe

Indirect prompt injection attack exfiltrates sensitive data across multi-agent LLM orchestrators, bypassing data access controls with a single injected payload

Prompt Injection Sensitive Information Disclosure nlp

PDF

attack arXiv Feb 16, 2026 · 7w ago

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau et al. · UK AI Security Institute · University of Oxford

Fully black-box automated jailbreak using binary classifier feedback and curriculum learning defeats Anthropic and GPT-5 safety classifiers