Latest papers

3 papers
tool arXiv Mar 3, 2026 · 4w ago

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Zhongxi Wang, Yueqian Lin, Jingyang Zhang et al. · Duke University · Virtue AI

Open-source platform for red-teaming multimodal LLMs with multi-turn jailbreaks and cross-modal payload switching

Prompt Injection nlpmultimodal
PDF
benchmark arXiv Feb 13, 2026 · 7w ago

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

Xu Li, Simon Yu, Minzhou Pan et al. · Northeastern University · Virtue AI +2 more

Benchmarks multi-turn jailbreaks in tool-using LLM agents and proposes ToolShield, a self-exploration defense reducing ASR by 30%

Prompt Injection Insecure Plugin Design nlp
PDF Code
tool arXiv Oct 3, 2025 · Oct 2025

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Zhaorun Chen, Xun Liu, Mintong Kang et al. · University of Chicago · University of Illinois +2 more

Adaptive agentic red-teaming system jailbreaks VLMs with 11 multimodal attack strategies, exceeding 90% ASR on Claude-4-Sonnet

Input Manipulation Attack Prompt Injection multimodalnlp
1 citations PDF Code