defense 2026

Expanding the Role of Diffusion Models for Robust Classifier Training

Pin-Han Huang , Shang-Tse Chen , Hsuan-Tien Lin

0 citations · 72 references · arXiv (Cornell University)

α

Published on arXiv

2602.19931

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Jointly leveraging diffusion model representations and synthetic data during adversarial training consistently improves robust accuracy across CIFAR-10, CIFAR-100, and ImageNet over baselines that only use synthetic data.

Diffusion Representation Alignment (DRA)

Novel technique introduced


Incorporating diffusion-generated synthetic data into adversarial training (AT) has been shown to substantially improve the training of robust image classifiers. In this work, we extend the role of diffusion models beyond merely generating synthetic data, examining whether their internal representations, which encode meaningful features of the data, can provide additional benefits for robust classifier training. Through systematic experiments, we show that diffusion models offer representations that are both diverse and partially robust, and that explicitly incorporating diffusion representations as an auxiliary learning signal during AT consistently improves robustness across settings. Furthermore, our representation analysis indicates that incorporating diffusion models into AT encourages more disentangled features, while diffusion representations and diffusion-generated synthetic data play complementary roles in shaping representations. Experiments on CIFAR-10, CIFAR-100, and ImageNet validate these findings, demonstrating the effectiveness of jointly leveraging diffusion representations and synthetic data within AT.


Key Contributions

  • Demonstrates that diffusion model internal representations are both diverse and partially robust, making them valuable beyond synthetic data generation
  • Proposes Diffusion Representation Alignment (DRA), which uses an auxiliary projection head to align classifier representations with diffusion model features during adversarial training
  • Shows that diffusion representations and diffusion-generated synthetic data play complementary roles, and their joint use encourages more disentangled classifier features

🛡️ Threat Analysis

Input Manipulation Attack

Paper directly targets defense against adversarial input attacks: it proposes a technique (DRA) to improve adversarial training, the canonical defense against inference-time input manipulation attacks. Evaluated via standard robust accuracy benchmarks on CIFAR-10, CIFAR-100, and ImageNet.


Details

Domains
vision
Model Types
cnndiffusiontransformer
Threat Tags
white_boxdigitalinference_timetraining_time
Datasets
CIFAR-10CIFAR-100ImageNet
Applications
image classificationadversarial robustness