VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search
MingSheng Li 1, Guangze Zhao 1, Sichen Liu 2
Published on arXiv
2510.15948
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
VisuoAlign achieves alignment safety rates above 0.92 while substantially reducing jailbreak success rates, outperforming all baseline methods on four safety benchmarks.
VisuoAlign
Novel technique introduced
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a critical challenge.Existing defenses and vulnerable to multimodal jailbreaks, as visual inputs introduce new attack surfaces, reasoning chains lack safety supervision, and alignment often degrades under modality fusion.To overcome these limitation, we propose VisuoAlign, a framework for multi-modal safety alignment via prompt-guided tree search.VisuoAlign embeds safety constrains into the reasoning process through visual-textual interactive prompts, employs Monte Carlo Tree Search(MCTS) to systematically construct diverse safety-critical prompt trajectories, and introduces prompt-based scaling to ensure real-time risk detection and compliant responses.Extensive experiments demonstrate that VisuoAlign proactively exposes risks, enables comprehensive dataset generation, and significantly improves the robustness of LVLMs against complex cross-modal threats.
Key Contributions
- Embeds safety constraints into multimodal reasoning via visual-textual interactive prompts, achieving intrinsic safety awareness across modalities
- Uses Monte Carlo Tree Search (MCTS) to systematically generate diverse safety-critical prompt trajectories for alignment dataset construction
- Introduces prompt-based safety scaling at inference time for real-time risk detection and compliant response generation
🛡️ Threat Analysis
The threat model explicitly includes adversarial visual perturbations and hidden prompt injections in images that bypass text-based filters in VLMs — qualifying for dual ML01+LLM01 tagging per the multimodal adversarial visual input rule. VisuoAlign defends against these visual attack surfaces.