VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines

Generative models such as GANs and diffusion models are widely used to synthesize photorealistic images and to support downstream creative and editing tasks. While adversarial attacks on discriminative models are well studied, attacks targeting generative pipelines where small, stealthy perturbations in inputs lead to controlled changes in outputs are less explored. This study introduces VagueGAN, an attack pipeline combining a modular perturbation network PoisonerNet with a Generator Discriminator pair to craft stealthy triggers that cause targeted changes in generated images. Attack efficacy is evaluated using a custom proxy metric, while stealth is analyzed through perceptual and frequency domain measures. The transferability of the method to a modern diffusion based pipeline is further examined through ControlNet guided editing. Interestingly, the experiments show that poisoned outputs can display higher visual quality compared to clean counterparts, challenging the assumption that poisoning necessarily reduces fidelity. Unlike conventional pixel level perturbations, latent space poisoning in GANs and diffusion pipelines can retain or even enhance output aesthetics, exposing a blind spot in pixel level defenses. Moreover, carefully optimized perturbations can produce consistent, stealthy effects on generator outputs while remaining visually inconspicuous, raising concerns for the integrity of image generation pipelines.

Key Contributions

VagueGAN: a latent-space poisoning framework pairing PoisonerNet (modular perturbation network) with a GAN adversarial training loop to craft stealthy, targeted backdoor triggers in generative pipelines
Demonstrates a 'beauty as stealth' paradox — poisoned GAN/diffusion outputs can exceed clean image visual quality, exposing a critical blind spot in pixel-level defenses that assume attacks degrade fidelity
First systematic study of backdoor trigger transferability from GAN-based synthesis to ControlNet-guided Stable Diffusion pipelines

🛡️ Threat Analysis

Data Poisoning Attack

The paper explicitly frames itself as both a poisoning AND backdoor attack (title and throughout), injecting stealthy perturbations into training data and latent inputs to corrupt generative pipeline behavior. Co-tagged per the rule that papers addressing both general poisoning and backdoor injection receive both ML02 and ML10.

Model Poisoning

VagueGAN's core contribution is PoisonerNet, which embeds hidden triggers into generative model latent inputs causing targeted, controlled changes in generated outputs while remaining visually inconspicuous — textbook backdoor/trojan behavior. The paper explicitly evaluates 'backdoor success' via a proxy metric and studies 'backdoor transferability' from GANs to diffusion pipelines.