defense 2025

PUREVQ-GAN: Defending Data Poisoning Attacks through Vector-Quantized Bottlenecks

Alexander Branch 1, Omead Pooladzandi 2, Radin Khosraviani , Sunay Gajanan Bhat 1, Jeffrey Jiang 1, Gregory Pottie 1

0 citations · 16 references · arXiv

α

Published on arXiv

2509.25792

Data Poisoning Attack

OWASP ML Top 10 — ML02

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Achieves 0% poison success rate against Gradient Matching and Bullseye Polytope attacks and 1.64% against Narcissus on CIFAR-10 while maintaining 91-95% clean accuracy, with 50x speedup over diffusion-based defenses.

PureVQ-GAN

Novel technique introduced


We introduce PureVQ-GAN, a defense against data poisoning that forces backdoor triggers through a discrete bottleneck using Vector-Quantized VAE with GAN discriminator. By quantizing poisoned images through a learned codebook, PureVQ-GAN destroys fine-grained trigger patterns while preserving semantic content. A GAN discriminator ensures outputs match the natural image distribution, preventing reconstruction of out-of-distribution perturbations. On CIFAR-10, PureVQ-GAN achieves 0% poison success rate (PSR) against Gradient Matching and Bullseye Polytope attacks, and 1.64% against Narcissus while maintaining 91-95% clean accuracy. Unlike diffusion-based defenses requiring hundreds of iterative refinement steps, PureVQ-GAN is over 50x faster, making it practical for real training pipelines.


Key Contributions

  • VQ-GAN preprocessing pipeline that forces poisoned training images through a discrete codebook bottleneck, destroying fine-grained trigger and perturbation patterns
  • GAN discriminator component that ensures purified outputs remain on the natural image manifold, preventing reconstruction of out-of-distribution poisoning signals
  • Achieves 0% poison success rate against Gradient Matching and Bullseye Polytope with 91-95% clean accuracy, while being 50x faster than diffusion-based purification defenses

🛡️ Threat Analysis

Data Poisoning Attack

Defends against clean-label data poisoning attacks (Gradient Matching, Bullseye Polytope) by purifying training data through a discrete VQ bottleneck before model training.

Model Poisoning

Explicitly defends against backdoor/trojan attacks with trigger patterns (Narcissus) by destroying fine-grained perturbations through vector quantization, preventing trigger-based behavior from being learned.


Details

Domains
vision
Model Types
gancnn
Threat Tags
training_timedigital
Datasets
CIFAR-10
Applications
image classification