ML Security Papers

Latest papers

2 papers

defense arXiv Feb 5, 2026 · 8w ago

Zhenxiong Yu, Zhi Yang, Zhiheng Jin et al. · SUFE · NUS +5 more

Event-driven LLM agent defense that selectively triggers hierarchical screening against prompt injection and multi-stage agent attacks

Prompt Injection Excessive Agency nlp

benchmark arXiv Sep 22, 2025 · Sep 2025

Satyapriya Krishna, Andy Zou, Rahul Gupta et al. · Amazon Nova Responsible AI · Center for AI Safety +2 more

Benchmark dataset for detecting LLMs that hide malicious chain-of-thought behind benign outputs via adversarial system prompt injections

Prompt Injection nlp

2 citations PDF