ML Security Papers

Latest papers

3 papers

tool arXiv Mar 3, 2026 · 4w ago

Zhongxi Wang, Yueqian Lin, Jingyang Zhang et al. · Duke University · Virtue AI

Open-source platform for red-teaming multimodal LLMs with multi-turn jailbreaks and cross-modal payload switching

Prompt Injection nlpmultimodal

benchmark arXiv Feb 13, 2026 · 7w ago

Xu Li, Simon Yu, Minzhou Pan et al. · Northeastern University · Virtue AI +2 more

Benchmarks multi-turn jailbreaks in tool-using LLM agents and proposes ToolShield, a self-exploration defense reducing ASR by 30%

Prompt Injection Insecure Plugin Design nlp

tool arXiv Oct 3, 2025 · Oct 2025

Zhaorun Chen, Xun Liu, Mintong Kang et al. · University of Chicago · University of Illinois +2 more

Adaptive agentic red-teaming system jailbreaks VLMs with 11 multimodal attack strategies, exceeding 90% ASR on Claude-4-Sonnet

Input Manipulation Attack Prompt Injection multimodalnlp

1 citations PDF Code