Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
Jonas Ngnawé 1,2, Maxime Heuillet 1,2, Sabyasachi Sahoo 3, Yann Pequignot 1,2,4, Ola Ahmad 1,2, Audrey Durand 5, Frédéric Precioso 1,2, Christian Gagné 1,2,4
Published on arXiv
2509.23325
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Epsilon-Scheduling consistently prevents suboptimal transfer and improves expected robustness across six pretrained models and five datasets under diverse perturbation configurations.
Epsilon-Scheduling
Novel technique introduced
Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remains challenging. Despite the abundance of non-robust pretrained models in open-source repositories, their potential for RFT is less understood. We address this knowledge gap by systematically examining RFT from such non-robust models. Our experiments reveal that fine-tuning non-robust models with a robust objective, even under small perturbations, can lead to poor performance, a phenomenon that we dub \emph{suboptimal transfer}. In challenging scenarios (eg, difficult tasks, high perturbation), the resulting performance can be so low that it may be considered a transfer failure. We find that fine-tuning using a robust objective impedes task adaptation at the beginning of training and eventually prevents optimal transfer. However, we propose a novel heuristic, \emph{Epsilon-Scheduling}, a schedule over perturbation strength used during training that promotes optimal transfer. Additionally, we introduce \emph{expected robustness}, a metric that captures performance across a range of perturbations, providing a more comprehensive evaluation of the accuracy-robustness trade-off for diverse models at test time. Extensive experiments on a wide range of configurations (six pretrained models and five datasets) show that \emph{Epsilon-Scheduling} successfully prevents \emph{suboptimal transfer} and consistently improves expected robustness.
Key Contributions
- Identifies and characterizes 'suboptimal transfer' — the phenomenon where robust fine-tuning of non-robust pretrained models severely degrades clean accuracy, up to near-random performance
- Proposes Epsilon-Scheduling, a two-hinge linear schedule over perturbation strength during training that starts at zero and ramps to the target epsilon, preventing suboptimal transfer
- Introduces 'expected robustness', a new evaluation metric capturing the expectation of accuracy over the full perturbation range for a more comprehensive accuracy–robustness trade-off assessment
🛡️ Threat Analysis
The paper is fundamentally about defending fine-tuned models against adversarial examples — it diagnoses a failure mode in robust adversarial training (suboptimal transfer) and proposes Epsilon-Scheduling as a defense that enables effective adversarial training during fine-tuning. This is squarely a robustness/adversarial-defense contribution.