Latest papers

3 papers
attack arXiv Jan 20, 2026 · 10w ago

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

Jiayi Yuan, Jonathan Nöther, Natasha Jaques et al. · University of Washington · Max Planck Institute for Software Systems

Evolutionary meta-search automatically designs agentic jailbreak pipelines achieving 96-100% ASR on Llama, GPT-4o, and Claude

Prompt Injection nlp
PDF
benchmark arXiv Oct 22, 2025 · Oct 2025

Hubble: a Model Suite to Advance the Study of LLM Memorization

Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan et al. · University of Southern California · Max Planck Institute for Software Systems

Open-source LLM suite with controlled sensitive data insertions for benchmarking memorization, membership inference, and machine unlearning

Model Inversion Attack Membership Inference Attack Sensitive Information Disclosure nlp
6 citations PDF
benchmark arXiv Aug 22, 2025 · Aug 2025

Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms

Jonathan Nöther, Adish Singla, Goran Radanovic · Max Planck Institute for Software Systems

Benchmarks LLM multi-agent system robustness against adversarial agents that hijack inter-agent communication to elicit harmful actions

Prompt Injection Excessive Agency nlp
PDF Code