VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.

Key Contributions

Training-free two-stage pre-detection mechanism that filters clean inputs cheaply via image consistency checks and text-embedding discrepancy analysis before invoking a costly LLM
Agentic LLM consolidation strategy that leverages both similarities and differences across multiple transformed-image responses to recover correct LVLM behavior
Efficient defense pipeline where most clean images skip expensive processing, keeping overhead minimal even under high adversarial example rates

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial images (imperceptible perturbations) that cause LVLMs to produce incorrect outputs at inference time — classic input manipulation attack defense with image transformation-based detection and purification.

Details

Domains

visionnlpmultimodal

Model Types

vlmllm

Threat Tags

grey_boxinference_timedigital

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Directional Embedding Smoothing for Robust Vision Language Models

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Randomized Smoothing Meets Vision-Language Models