defense arXiv Oct 10, 2025 · Oct 2025
MingSheng Li, Guangze Zhao, Sichen Liu · Harbin Institute of Technology · Xi’an Jiaotong-Liverpool University
Defends LVLMs against multimodal jailbreaks using MCTS-guided safety prompt trajectories embedded in the reasoning chain
Input Manipulation Attack Prompt Injection visionnlpmultimodal
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a critical challenge.Existing defenses and vulnerable to multimodal jailbreaks, as visual inputs introduce new attack surfaces, reasoning chains lack safety supervision, and alignment often degrades under modality fusion.To overcome these limitation, we propose VisuoAlign, a framework for multi-modal safety alignment via prompt-guided tree search.VisuoAlign embeds safety constrains into the reasoning process through visual-textual interactive prompts, employs Monte Carlo Tree Search(MCTS) to systematically construct diverse safety-critical prompt trajectories, and introduces prompt-based scaling to ensure real-time risk detection and compliant responses.Extensive experiments demonstrate that VisuoAlign proactively exposes risks, enables comprehensive dataset generation, and significantly improves the robustness of LVLMs against complex cross-modal threats.
vlm llm Harbin Institute of Technology · Xi’an Jiaotong-Liverpool University