Learning Robustness at Test-Time from a Non-Robust Teacher

Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.

Key Contributions

Label-free test-time adaptation framework that uses non-robust teacher predictions as semantic anchors for adversarial training
Theoretical analysis showing improved stability over self-consistency-based regularization in classical adversarial training
Demonstrates better robustness-accuracy trade-off and lower hyperparameter sensitivity in unsupervised test-time settings

🛡️ Threat Analysis

Input Manipulation Attack

Paper directly addresses adversarial robustness at test-time by adapting models to defend against adversarial perturbations. Proposes a defense framework using adversarial training strategies integrated into test-time adaptation schemes, evaluated against adversarial examples on CIFAR-10 and ImageNet.

Details

Domains

vision

Model Types

cnn

Threat Tags

inference_timedigital

Datasets

CIFAR-10ImageNet

Applications

2025 0 cit.

Input Manipulation Attack

90%