AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks

Adversarial distillation (AD) is a knowledge distillation technique that facilitates the transfer of robustness from teacher deep neural network (DNN) models to lightweight target (student) DNN models, enabling the target models to perform better than only training the student model independently. Some previous works focus on using a small, learnable teacher (guide) model to improve the robustness of a student model. Since a learnable guide model starts learning from scratch, maintaining its optimal state for effective knowledge transfer during co-training is challenging. Therefore, we propose a novel Adaptive Guidance Adversarial Training (AdaGAT) method. Our method, AdaGAT, dynamically adjusts the training state of the guide model to install robustness to the target model. Specifically, we develop two separate loss functions as part of the AdaGAT method, allowing the guide model to participate more actively in backpropagation to achieve its optimal state. We evaluated our approach via extensive experiments on three datasets: CIFAR-10, CIFAR-100, and TinyImageNet, using the WideResNet-34-10 model as the target model. Our observations reveal that appropriately adjusting the guide model within a certain accuracy range enhances the target model's robustness across various adversarial attacks compared to a variety of baseline models.

Key Contributions

AdaGAT method that dynamically adjusts a learnable guide model's training state to maximize knowledge transfer of robustness to a larger target model
Two separate loss functions (adaptive MSE/RMSE loss alongside the shared adversarial training loss) enabling the guide model to actively participate in backpropagation and maintain an optimal accuracy range
Empirical validation on CIFAR-10, CIFAR-100, and TinyImageNet showing improved robustness over multiple adversarial training baselines

🛡️ Threat Analysis

Input Manipulation Attack

AdaGAT is a defense against adversarial examples — it proposes an improved adversarial training methodology using adaptive guidance distillation to harden image classifiers against gradient-based input manipulation attacks (PGD, AutoAttack, etc.) at inference time.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxinference_timeuntargeteddigital

Datasets

CIFAR-10CIFAR-100TinyImageNet

Applications

2025 0 cit.

Input Manipulation Attack

100%