Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students-a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher-as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods. Our code is available at https://github.com/HongsinLee/saad.

Key Contributions

Identifies adversarial transferability (fraction of student-crafted adversarial examples that remain effective against the teacher) as the key factor explaining why stronger teachers can fail to improve student robustness in adversarial distillation.
Proposes SAAD, which reweights training samples by their measured transferability to downweight high-variance non-transferable samples and adds an inverse-transferability-weighted clean distillation term.
Demonstrates consistent AutoAttack robustness improvements over prior adversarial distillation methods on CIFAR-10, CIFAR-100, and Tiny-ImageNet with no additional computational cost.

🛡️ Threat Analysis

Input Manipulation Attack

SAAD is an adversarial training defense — it improves resistance to adversarial perturbations (evaluated with AutoAttack) by reformulating the adversarial distillation loss to emphasize transferable adversarial samples, directly hardening models against inference-time input manipulation attacks.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxinference_timetraining_time

Datasets

CIFAR-10CIFAR-100Tiny-ImageNet

Applications

2026 0 cit.

Input Manipulation Attack

83%

Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The Easy Path to Robustness: Coreset Selection using Sample Hardness

Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling

Algebraic Robustness Verification of Neural Networks

CIARD: Cyclic Iterative Adversarial Robustness Distillation

Lipschitz-aware Linearity Grafting for Certified Robustness

Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness

A unified Bayesian framework for adversarial robustness

Explanation-Guided Adversarial Training for Robust and Interpretable Models