PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention
Hefei Mei 1, Zirui Wang 1, Chang Xu 2, Jianyuan Guo 1,2, Minjing Dong 1
Published on arXiv
2602.19418
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
PA-Attack achieves an average 75.1% score reduction rate (SRR) across diverse downstream tasks and LVLM architectures, outperforming prior white-box and black-box baselines in both effectiveness and task generalization.
PA-Attack (Prototype-Anchored Attentive Attack)
Novel technique introduced
Large Vision-Language Models (LVLMs) are foundational to modern multimodal applications, yet their susceptibility to adversarial attacks remains a critical concern. Prior white-box attacks rarely generalize across tasks, and black-box methods depend on expensive transfer, which limits efficiency. The vision encoder, standardized and often shared across LVLMs, provides a stable gray-box pivot with strong cross-model transfer. Building on this premise, we introduce PA-Attack (Prototype-Anchored Attentive Attack). PA-Attack begins with a prototype-anchored guidance that provides a stable attack direction towards a general and dissimilar prototype, tackling the attribute-restricted issue and limited task generalization of vanilla attacks. Building on this, we propose a two-stage attention enhancement mechanism: (i) leverage token-level attention scores to concentrate perturbations on critical visual tokens, and (ii) adaptively recalibrate attention weights to track the evolving attention during the adversarial process. Extensive experiments across diverse downstream tasks and LVLM architectures show that PA-Attack achieves an average 75.1% score reduction rate (SRR), demonstrating strong attack effectiveness, efficiency, and task generalization in LVLMs. Code is available at https://github.com/hefeimei06/PA-Attack.
Key Contributions
- Prototype-anchored guidance that steers adversarial perturbations toward a general, dissimilar prototype to overcome attribute-restricted limitations and improve task generalization
- Two-stage attention enhancement: concentrating perturbations on high-attention visual tokens, then adaptively recalibrating attention weights to track the evolving adversarial process
- Gray-box attack framework exploiting the standardized, cross-LVLM shared vision encoder as a stable pivot, achieving 75.1% average score reduction rate across diverse tasks and architectures
🛡️ Threat Analysis
PA-Attack crafts adversarial visual perturbations targeting LVLM vision encoders at inference time — a gradient-based (gray-box) input manipulation attack. The method uses attention scores to concentrate perturbations on critical visual tokens and prototype-anchored guidance to maximize degradation across tasks.