Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2.

Key Contributions

Identifies that crop-level matching in M-Attack produces high-variance, near-orthogonal gradients due to ViT translation sensitivity, destabilizing black-box optimization.
Introduces Multi-Crop Alignment (MCA) and Auxiliary Target Alignment (ATA) to reduce gradient variance from source and target sides respectively, reformulating local matching as an asymmetric expectation.
Combines Patch Momentum and refined Patch Ensemble (PE+) into M-Attack-V2, achieving SOTA black-box attack success rates on frontier LVLMs: Claude-4.0 8%→30%, Gemini-2.5-Pro 83%→97%, GPT-5 98%→100%.

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is crafting adversarial image perturbations that cause LVLMs to produce targeted harmful outputs at inference time — gradient-based (white-box surrogate) transfer attacks resulting in adversarial inputs that manipulate LVLM behavior.