Latest papers

4 papers
benchmark arXiv Mar 16, 2026 · 21d ago

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · Gray Swan AI · OpenAI +6 more

Large-scale red teaming competition finds all frontier LLM agents vulnerable to concealed indirect prompt injection attacks with 0.5-8.5% success rates

Prompt Injection Excessive Agency nlpmultimodal
PDF
survey IACR ePrint Dec 1, 2025 · Dec 2025

Systems Security Foundations for Agentic Computing

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda et al. · Google · University of California +5 more

Surveys agentic AI security through a systems-security lens, covering prompt injection, tool-use risks, and 11 real-world attack case studies

Prompt Injection Insecure Plugin Design Excessive Agency nlp
3 citations PDF
benchmark arXiv Sep 22, 2025 · Sep 2025

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

Satyapriya Krishna, Andy Zou, Rahul Gupta et al. · Amazon Nova Responsible AI · Center for AI Safety +2 more

Benchmark dataset for detecting LLMs that hide malicious chain-of-thought behind benign outputs via adversarial system prompt injections

Prompt Injection nlp
2 citations PDF
benchmark arXiv Aug 27, 2025 · Aug 2025

Evaluating Language Model Reasoning about Confidential Information

Dylan Sam, Alexander Robey, Andy Zou et al. · Carnegie Mellon University · Gray Swan AI +1 more

Benchmarks LLM ability to guard confidential info, finding reasoning traces leak secrets and jailbreaks bypass access control

Sensitive Information Disclosure Prompt Injection nlp
PDF Code