Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
Roie Kazoom , Alon Goldberg , Hodaya Cohen , Ofer Hadar
Published on arXiv
2509.22836
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves attack success rates (ASR) and target-class success (TCS) consistently exceeding 99% across DenseNet-121, ResNet-50, ViT-B/16, and Swin-B/16 in a black-box setting, outperforming prior white-box and untargeted baselines.
Adversarial patch attacks pose a severe threat to deep neural networks, yet most existing approaches rely on unrealistic white-box assumptions, untargeted objectives, or produce visually conspicuous patches that limit real-world applicability. In this work, we introduce a novel framework for fully controllable adversarial patch generation, where the attacker can freely choose both the input image x and the target class y target, thereby dictating the exact misclassification outcome. Our method combines a generative U-Net design with Grad-CAM-guided patch placement, enabling semantic-aware localization that maximizes attack effectiveness while preserving visual realism. Extensive experiments across convolutional networks (DenseNet-121, ResNet-50) and vision transformers (ViT-B/16, Swin-B/16, among others) demonstrate that our approach achieves state-of-the-art performance across all settings, with attack success rates (ASR) and target-class success (TCS) consistently exceeding 99%. Importantly, we show that our method not only outperforms prior white-box attacks and untargeted baselines, but also surpasses existing non-realistic approaches that produce detectable artifacts. By simultaneously ensuring realism, targeted control, and black-box applicability-the three most challenging dimensions of patch-based attacks-our framework establishes a new benchmark for adversarial robustness research, bridging the gap between theoretical attack strength and practical stealthiness.
Key Contributions
- Targeted conditional GAN framework (U-Net generator) that synthesizes adversarial patches conditioned on both the input image and an attacker-specified target class, enabling full control over the victim's predicted label.
- Grad-CAM-guided patch placement using a surrogate ResNet-50 to localize patches at semantically salient regions without querying victim model gradients, enabling a fully black-box attack.
- Multi-objective loss combining adversarial loss, pixel-level perceptual loss, and deep feature consistency via frozen VGG16 to jointly maximize attack success and visual realism.
🛡️ Threat Analysis
Core contribution is a novel adversarial patch attack causing targeted misclassification at inference time — a canonical Input Manipulation Attack. Uses a U-Net conditional GAN for patch synthesis and Grad-CAM-guided placement to maximize attack success while preserving visual realism under black-box constraints.