Latest papers

5 papers
attack arXiv Mar 25, 2026 · 8w ago

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov et al. · MATS · ELLIS Institute Tübingen +3 more

AI agent autonomously discovers novel white-box jailbreak attacks outperforming 30+ existing methods with 100% ASR on target models

Input Manipulation Attack Prompt Injection Red-Team Agents Exploit Generation nlp
PDF Code
benchmark arXiv Feb 12, 2026 · Feb 2026

MalTool: Malicious Tool Attacks on LLM Agents

Yuepeng Hu, Yuqi Jia, Mengyuan Li et al. · Duke University · UC Berkeley

Benchmarks malicious tool code attacks on LLM agents; coding LLMs generate evasive malware that defeats VirusTotal and agent-specific detectors

AI Supply Chain Attacks Insecure Plugin Design Exploit Generation nlp
PDF
attack arXiv Dec 28, 2025 · Dec 2025

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

Moustapha Awwalou Diouf, Maimouna Tamah Diao, Iyiola Emmanuel Olatunji et al. · University of Luxembourg · University Cheikh Anta Diop +1 more

RSA pretexting strategy jailbreaks five major LLMs to generate working CVE exploits for ERP software with 100% success rate

Prompt Injection Exploit Generation Red-Team Agents nlp
PDF Code
attack arXiv Dec 24, 2025 · Dec 2025

LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors

Tianwei Lan, Farid Naït-Abdesselam · Université Paris Cité

Uses LLMs as dual-agent attackers to craft feature-level adversarial APKs that evade Android malware ML classifiers with 97% success

Input Manipulation Attack Red-Team Agents Exploit Generation tabularnlp
PDF
attack arXiv Sep 30, 2025 · Sep 2025

Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

Simin Chen, Yixin He, Suman Jana et al. · Columbia University · University of Southern California

Indirect prompt injection via adversarial GitHub issues tricks LLM repair agents into generating correct-but-vulnerable patches

Prompt Injection Excessive Agency Exploit Generation nlp
2 citations PDF