attack 2026

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Xiaohan Zhao , Zhaoyi Li , Yaxin Luo , Jiacheng Cui , Zhiqiang Shen

0 citations · 41 references · arXiv (Cornell University)

α

Published on arXiv

2602.17645

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

M-Attack-V2 boosts black-box adversarial attack success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming all prior LVLM attacks.

M-Attack-V2

Novel technique introduced


Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2.


Key Contributions

  • Identifies that crop-level matching in M-Attack produces high-variance, near-orthogonal gradients due to ViT translation sensitivity, destabilizing black-box optimization.
  • Introduces Multi-Crop Alignment (MCA) and Auxiliary Target Alignment (ATA) to reduce gradient variance from source and target sides respectively, reformulating local matching as an asymmetric expectation.
  • Combines Patch Momentum and refined Patch Ensemble (PE+) into M-Attack-V2, achieving SOTA black-box attack success rates on frontier LVLMs: Claude-4.0 8%→30%, Gemini-2.5-Pro 83%→97%, GPT-5 98%→100%.

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is crafting adversarial image perturbations that cause LVLMs to produce targeted harmful outputs at inference time — gradient-based (white-box surrogate) transfer attacks resulting in adversarial inputs that manipulate LVLM behavior.


Details

Domains
visionmultimodalnlp
Model Types
vlmllmtransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
Custom adversarial evaluation sets targeting Claude-4.0, Gemini-2.5-Pro, GPT-5
Applications
vision-language modelsimage captioningvisual question answeringmultimodal ai safety