attack 2025

VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines

Mostafa Mohaimen Akand Faisal , Rabeya Amin Jhuma

1 citations · 46 references · arXiv

α

Published on arXiv

2509.24891

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Latent-space poisoning in GANs and diffusion models can retain or enhance output perceptual quality while embedding hidden backdoor triggers, defeating pixel-level defenses and human visual inspection simultaneously.

VagueGAN / PoisonerNet

Novel technique introduced


Generative models such as GANs and diffusion models are widely used to synthesize photorealistic images and to support downstream creative and editing tasks. While adversarial attacks on discriminative models are well studied, attacks targeting generative pipelines where small, stealthy perturbations in inputs lead to controlled changes in outputs are less explored. This study introduces VagueGAN, an attack pipeline combining a modular perturbation network PoisonerNet with a Generator Discriminator pair to craft stealthy triggers that cause targeted changes in generated images. Attack efficacy is evaluated using a custom proxy metric, while stealth is analyzed through perceptual and frequency domain measures. The transferability of the method to a modern diffusion based pipeline is further examined through ControlNet guided editing. Interestingly, the experiments show that poisoned outputs can display higher visual quality compared to clean counterparts, challenging the assumption that poisoning necessarily reduces fidelity. Unlike conventional pixel level perturbations, latent space poisoning in GANs and diffusion pipelines can retain or even enhance output aesthetics, exposing a blind spot in pixel level defenses. Moreover, carefully optimized perturbations can produce consistent, stealthy effects on generator outputs while remaining visually inconspicuous, raising concerns for the integrity of image generation pipelines.


Key Contributions

  • VagueGAN: a latent-space poisoning framework pairing PoisonerNet (modular perturbation network) with a GAN adversarial training loop to craft stealthy, targeted backdoor triggers in generative pipelines
  • Demonstrates a 'beauty as stealth' paradox — poisoned GAN/diffusion outputs can exceed clean image visual quality, exposing a critical blind spot in pixel-level defenses that assume attacks degrade fidelity
  • First systematic study of backdoor trigger transferability from GAN-based synthesis to ControlNet-guided Stable Diffusion pipelines

🛡️ Threat Analysis

Data Poisoning Attack

The paper explicitly frames itself as both a poisoning AND backdoor attack (title and throughout), injecting stealthy perturbations into training data and latent inputs to corrupt generative pipeline behavior. Co-tagged per the rule that papers addressing both general poisoning and backdoor injection receive both ML02 and ML10.

Model Poisoning

VagueGAN's core contribution is PoisonerNet, which embeds hidden triggers into generative model latent inputs causing targeted, controlled changes in generated outputs while remaining visually inconspicuous — textbook backdoor/trojan behavior. The paper explicitly evaluates 'backdoor success' via a proxy metric and studies 'backdoor transferability' from GANs to diffusion pipelines.


Details

Domains
visiongenerative
Model Types
gandiffusion
Threat Tags
white_boxtraining_timetargeteddigital
Applications
image generationgenerative image editingtext-to-image synthesis