VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense
Published on arXiv
2602.19570
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves state-of-the-art defense accuracy against adversarial images while most clean inputs bypass costly processing, keeping computational overhead minimal.
VALD
Novel technique introduced
Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.
Key Contributions
- Training-free two-stage pre-detection mechanism that filters clean inputs cheaply via image consistency checks and text-embedding discrepancy analysis before invoking a costly LLM
- Agentic LLM consolidation strategy that leverages both similarities and differences across multiple transformed-image responses to recover correct LVLM behavior
- Efficient defense pipeline where most clean images skip expensive processing, keeping overhead minimal even under high adversarial example rates
🛡️ Threat Analysis
Directly defends against adversarial images (imperceptible perturbations) that cause LVLMs to produce incorrect outputs at inference time — classic input manipulation attack defense with image transformation-based detection and purification.