Latent Danger Zone: Distilling Unified Attention for Cross-Architecture Black-box Attacks
Yang Li 1, Chenyu Wang 1, Tingrui Wang 1, Yongwei Wang 2, Haonan Li 1, Zhunga Liu 1, Quan Pan 1
Published on arXiv
2509.19044
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
JAD achieves superior cross-architecture attack transferability between CNNs and ViTs while drastically reducing query counts to approximately one forward pass per adversarial example.
JAD (Joint Attention Distillation)
Novel technique introduced
Black-box adversarial attacks remain challenging due to limited access to model internals. Existing methods often depend on specific network architectures or require numerous queries, resulting in limited cross-architecture transferability and high query costs. To address these limitations, we propose JAD, a latent diffusion model framework for black-box adversarial attacks. JAD generates adversarial examples by leveraging a latent diffusion model guided by attention maps distilled from both a convolutional neural network (CNN) and a Vision Transformer (ViT) models. By focusing on image regions that are commonly sensitive across architectures, this approach crafts adversarial perturbations that transfer effectively between different model types. This joint attention distillation strategy enables JAD to be architecture-agnostic, achieving superior attack generalization across diverse models. Moreover, the generative nature of the diffusion framework yields high adversarial sample generation efficiency by reducing reliance on iterative queries. Experiments demonstrate that JAD offers improved attack generalization, generation efficiency, and cross-architecture transferability compared to existing methods, providing a promising and effective paradigm for black-box adversarial attacks.
Key Contributions
- JAD framework that fuses CNN and ViT attention maps into a unified saliency objective to guide a latent diffusion adversarial generator toward regions commonly vulnerable across architectures
- Architecture-agnostic adversarial example generation that transfers effectively between CNNs and Vision Transformers without iterative per-input query optimization
- Demonstrates superior cross-architecture transferability and generation efficiency over existing methods including CDMA and integrated-gradient-based attacks
🛡️ Threat Analysis
Core contribution is crafting adversarial inputs that cause misclassification across diverse model architectures at inference time — a transfer-based black-box adversarial example attack using latent diffusion guided by joint CNN+ViT attention distillation.