attack 2025

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Mengqi He 1, Xinyu Tian 1, Xin Shen 2, Jinhong Ni 1, Shu Zou 1, Zhaoyuan Yang 3, Jing Zhang 1

0 citations · 53 references · arXiv

α

Published on arXiv

2512.21815

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Concentrating adversarial perturbations on the ~20% of high-entropy decoding positions achieves 93-95% attack success rate and converts 35-49% of benign VLM outputs to harmful content, with 17-26% harmful transfer rates to unseen architectures.

EGA (Entropy-bank Guided Adversarial attacks)

Novel technique introduced


Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about 20%) of high-entropy tokens, i.e., critical decision points in autoregressive generation, disproportionately governs output trajectories. By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk. Remarkably, these vulnerable high-entropy forks recur across architecturally diverse VLMs, enabling feasible transferability (17-26% harmful rates on unseen targets). Motivated by these findings, we propose Entropy-bank Guided Adversarial attacks (EGA), which achieves competitive attack success rates (93-95%) alongside high harmful conversion, thereby revealing new weaknesses in current VLM safety mechanisms.


Key Contributions

  • Empirical finding that ~20% of high-entropy tokens disproportionately govern autoregressive output trajectories in VLMs, invalidating the equal-contribution assumption of prior entropy-based attacks.
  • Demonstration that selectively targeting high-entropy 'fork' positions converts 35-49% of benign VLM outputs to harmful ones — exposing a deeper safety risk than semantic degradation alone.
  • EGA (Entropy-bank Guided Adversarial attacks) achieves 93-95% attack success rate with smaller perturbation budgets, and achieves 17-26% harmful conversion on unseen VLM architectures via transferability.

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is gradient-based adversarial perturbations applied to visual inputs of VLMs at inference time, concentrating attack budget on high-entropy token positions to maximize semantic disruption — a direct input manipulation attack.


Details

Domains
visionnlpmultimodal
Model Types
vlmmultimodal
Threat Tags
white_boxblack_boxinference_timetargeteddigital
Applications
vision-language modelsmultimodal ai safetyvisual question answering