ML Security Papers

Latest papers

3 papers

attack arXiv Jan 20, 2026 · 10w ago

Jiayi Yuan, Jonathan Nöther, Natasha Jaques et al. · University of Washington · Max Planck Institute for Software Systems

Evolutionary meta-search automatically designs agentic jailbreak pipelines achieving 96-100% ASR on Llama, GPT-4o, and Claude

Prompt Injection nlp

benchmark arXiv Oct 22, 2025 · Oct 2025

Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan et al. · University of Southern California · Max Planck Institute for Software Systems

Open-source LLM suite with controlled sensitive data insertions for benchmarking memorization, membership inference, and machine unlearning

Model Inversion Attack Membership Inference Attack Sensitive Information Disclosure nlp

6 citations PDF

benchmark arXiv Aug 22, 2025 · Aug 2025

Jonathan Nöther, Adish Singla, Goran Radanovic · Max Planck Institute for Software Systems

Benchmarks LLM multi-agent system robustness against adversarial agents that hijack inter-agent communication to elicit harmful actions

Prompt Injection Excessive Agency nlp