ML Security Papers

Latest papers

4 papers

benchmark arXiv Mar 16, 2026 · 21d ago

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · Gray Swan AI · OpenAI +6 more

Large-scale red teaming competition finds all frontier LLM agents vulnerable to concealed indirect prompt injection attacks with 0.5-8.5% success rates

Prompt Injection Excessive Agency nlpmultimodal

PDF

survey IACR ePrint Dec 1, 2025 · Dec 2025

Systems Security Foundations for Agentic Computing

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda et al. · Google · University of California +5 more

Surveys agentic AI security through a systems-security lens, covering prompt injection, tool-use risks, and 11 real-world attack case studies

Prompt Injection Insecure Plugin Design Excessive Agency nlp

3 citations PDF

benchmark arXiv Sep 22, 2025 · Sep 2025

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

Satyapriya Krishna, Andy Zou, Rahul Gupta et al. · Amazon Nova Responsible AI · Center for AI Safety +2 more

Benchmark dataset for detecting LLMs that hide malicious chain-of-thought behind benign outputs via adversarial system prompt injections

Prompt Injection nlp

2 citations PDF

benchmark arXiv Aug 27, 2025 · Aug 2025

Evaluating Language Model Reasoning about Confidential Information

Dylan Sam, Alexander Robey, Andy Zou et al. · Carnegie Mellon University · Gray Swan AI +1 more

Benchmarks LLM ability to guard confidential info, finding reasoning traces leak secrets and jailbreaks bypass access control

Sensitive Information Disclosure Prompt Injection nlp

PDF Code

Latest papers

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Systems Security Foundations for Agentic Computing

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

Evaluating Language Model Reasoning about Confidential Information

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue