defense 2026

VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Nadav Kadvil , Ayellet Tal

0 citations · 36 references · arXiv (Cornell University)

α

Published on arXiv

2602.19570

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves state-of-the-art defense accuracy against adversarial images while most clean inputs bypass costly processing, keeping computational overhead minimal.

VALD

Novel technique introduced


Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.


Key Contributions

  • Training-free two-stage pre-detection mechanism that filters clean inputs cheaply via image consistency checks and text-embedding discrepancy analysis before invoking a costly LLM
  • Agentic LLM consolidation strategy that leverages both similarities and differences across multiple transformed-image responses to recover correct LVLM behavior
  • Efficient defense pipeline where most clean images skip expensive processing, keeping overhead minimal even under high adversarial example rates

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial images (imperceptible perturbations) that cause LVLMs to produce incorrect outputs at inference time — classic input manipulation attack defense with image transformation-based detection and purification.


Details

Domains
visionnlpmultimodal
Model Types
vlmllm
Threat Tags
grey_boxinference_timedigital
Applications
image captioningvisual question answeringlarge vision-language models