Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
Xiaohan Zhao , Zhaoyi Li , Yaxin Luo , Jiacheng Cui , Zhiqiang Shen
Published on arXiv
2602.17645
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
M-Attack-V2 boosts black-box adversarial attack success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming all prior LVLM attacks.
M-Attack-V2
Novel technique introduced
Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2.
Key Contributions
- Identifies that crop-level matching in M-Attack produces high-variance, near-orthogonal gradients due to ViT translation sensitivity, destabilizing black-box optimization.
- Introduces Multi-Crop Alignment (MCA) and Auxiliary Target Alignment (ATA) to reduce gradient variance from source and target sides respectively, reformulating local matching as an asymmetric expectation.
- Combines Patch Momentum and refined Patch Ensemble (PE+) into M-Attack-V2, achieving SOTA black-box attack success rates on frontier LVLMs: Claude-4.0 8%→30%, Gemini-2.5-Pro 83%→97%, GPT-5 98%→100%.
🛡️ Threat Analysis
Core contribution is crafting adversarial image perturbations that cause LVLMs to produce targeted harmful outputs at inference time — gradient-based (white-box surrogate) transfer attacks resulting in adversarial inputs that manipulate LVLM behavior.