Learning Robustness at Test-Time from a Non-Robust Teacher
Stefano Bianchettin , Giulio Rossolini , Giorgio Buttazzo
Published on arXiv
2604.11590
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves improved optimization stability and better robustness-accuracy trade-off than existing baselines on CIFAR-10 and ImageNet under photometric transformations in test-time adaptation settings
Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.
Key Contributions
- Label-free test-time adaptation framework that uses non-robust teacher predictions as semantic anchors for adversarial training
- Theoretical analysis showing improved stability over self-consistency-based regularization in classical adversarial training
- Demonstrates better robustness-accuracy trade-off and lower hyperparameter sensitivity in unsupervised test-time settings
🛡️ Threat Analysis
Paper directly addresses adversarial robustness at test-time by adapting models to defend against adversarial perturbations. Proposes a defense framework using adversarial training strategies integrated into test-time adaptation schemes, evaluated against adversarial examples on CIFAR-10 and ImageNet.