defense 2025

VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search

MingSheng Li 1, Guangze Zhao 1, Sichen Liu 2

0 citations · 45 references · arXiv

α

Published on arXiv

2510.15948

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

VisuoAlign achieves alignment safety rates above 0.92 while substantially reducing jailbreak success rates, outperforming all baseline methods on four safety benchmarks.

VisuoAlign

Novel technique introduced


Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a critical challenge.Existing defenses and vulnerable to multimodal jailbreaks, as visual inputs introduce new attack surfaces, reasoning chains lack safety supervision, and alignment often degrades under modality fusion.To overcome these limitation, we propose VisuoAlign, a framework for multi-modal safety alignment via prompt-guided tree search.VisuoAlign embeds safety constrains into the reasoning process through visual-textual interactive prompts, employs Monte Carlo Tree Search(MCTS) to systematically construct diverse safety-critical prompt trajectories, and introduces prompt-based scaling to ensure real-time risk detection and compliant responses.Extensive experiments demonstrate that VisuoAlign proactively exposes risks, enables comprehensive dataset generation, and significantly improves the robustness of LVLMs against complex cross-modal threats.


Key Contributions

  • Embeds safety constraints into multimodal reasoning via visual-textual interactive prompts, achieving intrinsic safety awareness across modalities
  • Uses Monte Carlo Tree Search (MCTS) to systematically generate diverse safety-critical prompt trajectories for alignment dataset construction
  • Introduces prompt-based safety scaling at inference time for real-time risk detection and compliant response generation

🛡️ Threat Analysis

Input Manipulation Attack

The threat model explicitly includes adversarial visual perturbations and hidden prompt injections in images that bypass text-based filters in VLMs — qualifying for dual ML01+LLM01 tagging per the multimodal adversarial visual input rule. VisuoAlign defends against these visual attack surfaces.


Details

Domains
visionnlpmultimodal
Model Types
vlmllm
Threat Tags
inference_timetraining_time
Datasets
safety benchmarks (4, names not specified in excerpt)
Applications
large vision-language modelsmultimodal safety alignmentcross-modal jailbreak defense