defense 2025

AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks

Zhenyu Liu 1, Huizhi Liang 1, Xinrun Li 1, Vaclav Snasel 2, Varun Ojha 1

0 citations · The 8th Chinese Conference on ...

α

Published on arXiv

2508.17265

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Maintaining the guide model within a dynamically adjusted clean-accuracy range consistently improves target model (WideResNet-34-10) robust accuracy across adversarial attacks compared to baseline adversarial distillation methods including LBGAT.

AdaGAT (Adaptive Guidance Adversarial Training)

Novel technique introduced


Adversarial distillation (AD) is a knowledge distillation technique that facilitates the transfer of robustness from teacher deep neural network (DNN) models to lightweight target (student) DNN models, enabling the target models to perform better than only training the student model independently. Some previous works focus on using a small, learnable teacher (guide) model to improve the robustness of a student model. Since a learnable guide model starts learning from scratch, maintaining its optimal state for effective knowledge transfer during co-training is challenging. Therefore, we propose a novel Adaptive Guidance Adversarial Training (AdaGAT) method. Our method, AdaGAT, dynamically adjusts the training state of the guide model to install robustness to the target model. Specifically, we develop two separate loss functions as part of the AdaGAT method, allowing the guide model to participate more actively in backpropagation to achieve its optimal state. We evaluated our approach via extensive experiments on three datasets: CIFAR-10, CIFAR-100, and TinyImageNet, using the WideResNet-34-10 model as the target model. Our observations reveal that appropriately adjusting the guide model within a certain accuracy range enhances the target model's robustness across various adversarial attacks compared to a variety of baseline models.


Key Contributions

  • AdaGAT method that dynamically adjusts a learnable guide model's training state to maximize knowledge transfer of robustness to a larger target model
  • Two separate loss functions (adaptive MSE/RMSE loss alongside the shared adversarial training loss) enabling the guide model to actively participate in backpropagation and maintain an optimal accuracy range
  • Empirical validation on CIFAR-10, CIFAR-100, and TinyImageNet showing improved robustness over multiple adversarial training baselines

🛡️ Threat Analysis

Input Manipulation Attack

AdaGAT is a defense against adversarial examples — it proposes an improved adversarial training methodology using adaptive guidance distillation to harden image classifiers against gradient-based input manipulation attacks (PGD, AutoAttack, etc.) at inference time.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_timeuntargeteddigital
Datasets
CIFAR-10CIFAR-100TinyImageNet
Applications
image classification