defense 2026

Learning Robustness at Test-Time from a Non-Robust Teacher

Stefano Bianchettin , Giulio Rossolini , Giorgio Buttazzo

0 citations

α

Published on arXiv

2604.11590

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves improved optimization stability and better robustness-accuracy trade-off than existing baselines on CIFAR-10 and ImageNet under photometric transformations in test-time adaptation settings


Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.


Key Contributions

  • Label-free test-time adaptation framework that uses non-robust teacher predictions as semantic anchors for adversarial training
  • Theoretical analysis showing improved stability over self-consistency-based regularization in classical adversarial training
  • Demonstrates better robustness-accuracy trade-off and lower hyperparameter sensitivity in unsupervised test-time settings

🛡️ Threat Analysis

Input Manipulation Attack

Paper directly addresses adversarial robustness at test-time by adapting models to defend against adversarial perturbations. Proposes a defense framework using adversarial training strategies integrated into test-time adaptation schemes, evaluated against adversarial examples on CIFAR-10 and ImageNet.


Details

Domains
vision
Model Types
cnn
Threat Tags
inference_timedigital
Datasets
CIFAR-10ImageNet
Applications
image classification