attack arXiv Jan 22, 2026 · 10w ago
Mingyu Yu, Lana Liu, Zhehao Zhao et al. · Beijing University of Posts and Telecommunications
Jailbreaks multimodal LLMs into generating harmful images via semantic-agnostic visual splicing and inductive text recomposition, achieving 98% success on GPT-5
Input Manipulation Attack Prompt Injection visionnlpmultimodal
The rapid advancement of Multimodal Large Language Models (MLLMs) has introduced complex security challenges, particularly at the intersection of textual and visual safety. While existing schemes have explored the security vulnerabilities of MLLMs, the investigation into their visual safety boundaries remains insufficient. In this paper, we propose Beyond Visual Safety (BVS), a novel image-text pair jailbreaking framework specifically designed to probe the visual safety boundaries of MLLMs. BVS employs a "reconstruction-then-generation" strategy, leveraging neutralized visual splicing and inductive recomposition to decouple malicious intent from raw inputs, thereby leading MLLMs to be induced into generating harmful images. Experimental results demonstrate that BVS achieves a remarkable jailbreak success rate of 98.21\% against GPT-5 (12 January 2026 release). Our findings expose critical vulnerabilities in the visual safety alignment of current MLLMs.
vlm llm multimodal Beijing University of Posts and Telecommunications