VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models

As Vision Language Models (VLMs) are deployed across safety-critical applications, understanding and controlling their behavioral patterns has become increasingly important. Existing behavioral control methods face significant limitations: system prompting approaches could easily be overridden by user instructions, while applying activation-based steering vectors requires invasive runtime access to model internals, precluding deployment with API-based services and closed-source models. Finding steering methods that transfer across multiple VLMs is still an open area of research. To this end, we introduce universal visual input based steering for output redirection (VISOR++), to achieve behavioral control through optimized visual inputs alone. We demonstrate that a single VISOR++ image can be generated for an ensemble of VLMs to emulate each of their steering vectors. By crafting universal visual inputs that induce target activation patterns, VISOR++ eliminates the need for runtime model access while remaining deployment-agnostic. This means that when an underlying model supports multimodal capability, model behaviors can be steered by inserting an image input replacing runtime steering vector based interventions. We first demonstrate the effectiveness of the VISOR++ images on open-access models such as LLaVA-1.5-7B and IDEFICS2-8B along three alignment directions: refusal, sycophancy and survival instinct. Both the model-specific steering images and the jointly optimized images achieve performance parity closely following that of steering vectors for both positive and negative steering tasks. We also show the promise of VISOR++ images in achieving directional behavioral shifts for unseen models including both open-access and closed-access ones. Furthermore, VISOR++ images are able to preserve 99.9% performance on 14,000 unrelated MMLU evaluation tasks.

Key Contributions

Universal adversarially optimized visual inputs (VISOR++ images) that steer VLM behavior along alignment axes (refusal, sycophancy, survival instinct) without requiring runtime access to model internals
Joint optimization over an ensemble of VLMs to produce a single universal steering image that transfers to unseen open- and closed-source models
Demonstration that VISOR++ achieves performance parity with internal activation steering vectors while preserving 99.9% utility on 14,000 unrelated MMLU tasks

🛡️ Threat Analysis

Input Manipulation Attack

VISOR++ uses gradient-based optimization to craft adversarial visual inputs that induce targeted activation patterns in VLMs, directly analogous to adversarial perturbation attacks — the visual input is engineered to cause specific targeted behavioral shifts at inference time.

Details

Domains

visionnlpmultimodal

Model Types

vlmllmmultimodal

Threat Tags

white_boxblack_boxinference_timetargeteddigital

Datasets

MMLULLaVA-1.5-7B evaluationsIDEFICS2-8B evaluations

Applications

2026 1 cit.

Input Manipulation Attack

95%

VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography

When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models

Jailbreaks on Vision Language Model via Multimodal Reasoning

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization