ML Security Papers

Latest papers

5 papers

attack arXiv Mar 25, 2026 · 8w ago

Alexander Panfilov, Peter Romov, Igor Shilov et al. · MATS · ELLIS Institute Tübingen +3 more

AI agent autonomously discovers novel white-box jailbreak attacks outperforming 30+ existing methods with 100% ASR on target models

Input Manipulation Attack Prompt Injection Red-Team Agents Exploit Generation nlp

benchmark arXiv Feb 12, 2026 · Feb 2026

Yuepeng Hu, Yuqi Jia, Mengyuan Li et al. · Duke University · UC Berkeley

Benchmarks malicious tool code attacks on LLM agents; coding LLMs generate evasive malware that defeats VirusTotal and agent-specific detectors

AI Supply Chain Attacks Insecure Plugin Design Exploit Generation nlp

attack arXiv Dec 28, 2025 · Dec 2025

Moustapha Awwalou Diouf, Maimouna Tamah Diao, Iyiola Emmanuel Olatunji et al. · University of Luxembourg · University Cheikh Anta Diop +1 more

RSA pretexting strategy jailbreaks five major LLMs to generate working CVE exploits for ERP software with 100% success rate

Prompt Injection Exploit Generation Red-Team Agents nlp

attack arXiv Dec 24, 2025 · Dec 2025

Tianwei Lan, Farid Naït-Abdesselam · Université Paris Cité

Uses LLMs as dual-agent attackers to craft feature-level adversarial APKs that evade Android malware ML classifiers with 97% success

Input Manipulation Attack Red-Team Agents Exploit Generation tabularnlp

attack arXiv Sep 30, 2025 · Sep 2025

Simin Chen, Yixin He, Suman Jana et al. · Columbia University · University of Southern California

Indirect prompt injection via adversarial GitHub issues tricks LLM repair agents into generating correct-but-vulnerable patches

Prompt Injection Excessive Agency Exploit Generation nlp

2 citations PDF