Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN

Adversarial patch attacks pose a severe threat to deep neural networks, yet most existing approaches rely on unrealistic white-box assumptions, untargeted objectives, or produce visually conspicuous patches that limit real-world applicability. In this work, we introduce a novel framework for fully controllable adversarial patch generation, where the attacker can freely choose both the input image x and the target class y target, thereby dictating the exact misclassification outcome. Our method combines a generative U-Net design with Grad-CAM-guided patch placement, enabling semantic-aware localization that maximizes attack effectiveness while preserving visual realism. Extensive experiments across convolutional networks (DenseNet-121, ResNet-50) and vision transformers (ViT-B/16, Swin-B/16, among others) demonstrate that our approach achieves state-of-the-art performance across all settings, with attack success rates (ASR) and target-class success (TCS) consistently exceeding 99%. Importantly, we show that our method not only outperforms prior white-box attacks and untargeted baselines, but also surpasses existing non-realistic approaches that produce detectable artifacts. By simultaneously ensuring realism, targeted control, and black-box applicability-the three most challenging dimensions of patch-based attacks-our framework establishes a new benchmark for adversarial robustness research, bridging the gap between theoretical attack strength and practical stealthiness.

Key Contributions

Targeted conditional GAN framework (U-Net generator) that synthesizes adversarial patches conditioned on both the input image and an attacker-specified target class, enabling full control over the victim's predicted label.
Grad-CAM-guided patch placement using a surrogate ResNet-50 to localize patches at semantically salient regions without querying victim model gradients, enabling a fully black-box attack.
Multi-objective loss combining adversarial loss, pixel-level perceptual loss, and deep feature consistency via frozen VGG16 to jointly maximize attack success and visual realism.

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is a novel adversarial patch attack causing targeted misclassification at inference time — a canonical Input Manipulation Attack. Uses a U-Net conditional GAN for patch synthesis and Grad-CAM-guided placement to maximize attack success while preserving visual realism under black-box constraints.

Details

Domains

vision

Model Types

cnntransformergan

Threat Tags

black_boxinference_timetargeteddigitalphysical

Datasets

ImageNet

Applications

2025 1 cit.

Input Manipulation Attack

86%