defense 2025

Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Hongsin Lee , Hye Won Chung

0 citations · 82 references · arXiv

α

Published on arXiv

2512.10275

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

SAAD consistently improves AutoAttack robustness over prior adversarial distillation methods across CIFAR-10, CIFAR-100, and Tiny-ImageNet, especially when using stronger teacher models that previously caused robust saturation.

SAAD (Sample-wise Adaptive Adversarial Distillation)

Novel technique introduced


Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students-a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher-as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods. Our code is available at https://github.com/HongsinLee/saad.


Key Contributions

  • Identifies adversarial transferability (fraction of student-crafted adversarial examples that remain effective against the teacher) as the key factor explaining why stronger teachers can fail to improve student robustness in adversarial distillation.
  • Proposes SAAD, which reweights training samples by their measured transferability to downweight high-variance non-transferable samples and adds an inverse-transferability-weighted clean distillation term.
  • Demonstrates consistent AutoAttack robustness improvements over prior adversarial distillation methods on CIFAR-10, CIFAR-100, and Tiny-ImageNet with no additional computational cost.

🛡️ Threat Analysis

Input Manipulation Attack

SAAD is an adversarial training defense — it improves resistance to adversarial perturbations (evaluated with AutoAttack) by reformulating the adversarial distillation loss to emphasize transferable adversarial samples, directly hardening models against inference-time input manipulation attacks.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_timetraining_time
Datasets
CIFAR-10CIFAR-100Tiny-ImageNet
Applications
image classification