attack 2026

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression

Xinwei Zhang 1, Hangcheng Liu 2, Li Bai 1, Hao Wang 3, Qingqing Ye 1, Tianwei Zhang 2, Haibo Hu 1

0 citations · 39 references · arXiv

α

Published on arXiv

2601.21531

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

CAGE consistently achieves lower robust accuracy than encoder-based baseline attacks across diverse visual token compression mechanisms, revealing that robustness assessments ignoring compression are overly optimistic

CAGE (Compression-AliGnEd attack)

Novel technique introduced


Visual token compression is widely used to accelerate large vision-language models (LVLMs) by pruning or merging visual tokens, yet its adversarial robustness remains unexplored. We show that existing encoder-based attacks can substantially overestimate the robustness of compressed LVLMs, due to an optimization-inference mismatch: perturbations are optimized on the full-token representation, while inference is performed through a token-compression bottleneck. To address this gap, we propose the Compression-AliGnEd attack (CAGE), which aligns perturbation optimization with compression inference without assuming access to the deployed compression mechanism or its token budget. CAGE combines (i) expected feature disruption, which concentrates distortion on tokens likely to survive across plausible budgets, and (ii) rank distortion alignment, which actively aligns token distortions with rank scores to promote the retention of highly distorted evidence. Across diverse representative plug-and-play compression mechanisms and datasets, our results show that CAGE consistently achieves lower robust accuracy than the baseline. This work highlights that robustness assessments ignoring compression can be overly optimistic, calling for compression-aware security evaluation and defenses for efficient LVLMs.


Key Contributions

  • Identifies the optimization-inference mismatch in adversarial attacks on compressed LVLMs, demonstrating that existing attacks overestimate robustness by ignoring the token-compression bottleneck
  • Proposes CAGE (Compression-AliGnEd attack) combining expected feature disruption and rank distortion alignment, without requiring access to the deployed compression mechanism or its token budget
  • Empirically demonstrates CAGE achieves consistently lower robust accuracy than baseline attacks across diverse plug-and-play compression mechanisms, and explores initial compression-aware defenses

🛡️ Threat Analysis

Input Manipulation Attack

CAGE is a gradient-based adversarial perturbation attack on visual inputs to VLMs at inference time. It directly targets the optimization-inference mismatch introduced by token compression, causing lower robust accuracy. This is a classic ML01 adversarial example attack, with the novel contribution being alignment of perturbation optimization with the compression bottleneck.


Details

Domains
visionmultimodal
Model Types
vlm
Threat Tags
grey_boxinference_timedigital
Applications
visual question answeringimage understandingvision-language models