defense 2025

Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling

Jonas Ngnawé ^1,2, Maxime Heuillet ^1,2, Sabyasachi Sahoo ³, Yann Pequignot ^1,2,4, Ola Ahmad ^1,2, Audrey Durand ⁵, Frédéric Precioso ^1,2, Christian Gagné ^1,2,4

¹ Université Laval

² Mila

³ CortAIx Labs, Thales Digital Solutions

⁴ Canada CIFAR AI Chair

⁵ Université Côte d’Azur

0 citations · 60 references · arXiv

Published on arXiv

2509.23325

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Epsilon-Scheduling consistently prevents suboptimal transfer and improves expected robustness across six pretrained models and five datasets under diverse perturbation configurations.

Epsilon-Scheduling

Novel technique introduced

Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remains challenging. Despite the abundance of non-robust pretrained models in open-source repositories, their potential for RFT is less understood. We address this knowledge gap by systematically examining RFT from such non-robust models. Our experiments reveal that fine-tuning non-robust models with a robust objective, even under small perturbations, can lead to poor performance, a phenomenon that we dub \emph{suboptimal transfer}. In challenging scenarios (eg, difficult tasks, high perturbation), the resulting performance can be so low that it may be considered a transfer failure. We find that fine-tuning using a robust objective impedes task adaptation at the beginning of training and eventually prevents optimal transfer. However, we propose a novel heuristic, \emph{Epsilon-Scheduling}, a schedule over perturbation strength used during training that promotes optimal transfer. Additionally, we introduce \emph{expected robustness}, a metric that captures performance across a range of perturbations, providing a more comprehensive evaluation of the accuracy-robustness trade-off for diverse models at test time. Extensive experiments on a wide range of configurations (six pretrained models and five datasets) show that \emph{Epsilon-Scheduling} successfully prevents \emph{suboptimal transfer} and consistently improves expected robustness.

Key Contributions

Identifies and characterizes 'suboptimal transfer' — the phenomenon where robust fine-tuning of non-robust pretrained models severely degrades clean accuracy, up to near-random performance
Proposes Epsilon-Scheduling, a two-hinge linear schedule over perturbation strength during training that starts at zero and ramps to the target epsilon, preventing suboptimal transfer
Introduces 'expected robustness', a new evaluation metric capturing the expectation of accuracy over the full perturbation range for a more comprehensive accuracy–robustness trade-off assessment

🛡️ Threat Analysis

Input Manipulation Attack

The paper is fundamentally about defending fine-tuned models against adversarial examples — it diagnoses a failure mode in robust adversarial training (suboptimal transfer) and proposes Epsilon-Scheduling as a defense that enables effective adversarial training during fine-tuning. This is squarely a robustness/adversarial-defense contribution.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_timetraining_time

Applications

image classification

Read PDF arXiv DOI Code

Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Explanation-Guided Adversarial Training for Robust and Interpretable Models

A unified Bayesian framework for adversarial robustness

Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification

Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

S2AP: Score-space Sharpness Minimization for Adversarial Pruning

Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks

Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness

Expanding the Role of Diffusion Models for Robust Classifier Training