defense 2025

FERD: Fairness-Enhanced Data-Free Robustness Distillation

Zhengxiao Li 1, Liming Lu 1, Xu Zheng 2,3, Siyuan Liang 4, Zhenghan Chen 5, Yongbin Zhou 1, Shuchao Pang 1

0 citations · 49 references · arXiv

α

Published on arXiv

2509.20793

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Improves worst-class robustness against FGSM and AutoAttack by 15.1% and 6.4% respectively using MobileNet-V2 on CIFAR-10 over the baseline data-free robustness distillation method.

FERD (Fairness-Enhanced Data-Free Robustness Distillation)

Novel technique introduced


Data-Free Robustness Distillation (DFRD) aims to transfer the robustness from the teacher to the student without accessing the training data. While existing methods focus on overall robustness, they overlook the robust fairness issues, leading to severe disparity of robustness across different categories. In this paper, we find two key problems: (1) student model distilled with equal class proportion data behaves significantly different across distinct categories; and (2) the robustness of student model is not stable across different attacks target. To bridge these gaps, we present the first Fairness-Enhanced data-free Robustness Distillation (FERD) framework to adjust the proportion and distribution of adversarial examples. For the proportion, FERD adopts a robustness-guided class reweighting strategy to synthesize more samples for the less robust categories, thereby improving robustness of them. For the distribution, FERD generates complementary data samples for advanced robustness distillation. It generates Fairness-Aware Examples (FAEs) by enforcing a uniformity constraint on feature-level predictions, which suppress the dominance of class-specific non-robust features, providing a more balanced representation across all categories. Then, FERD constructs Uniform-Target Adversarial Examples (UTAEs) from FAEs by applying a uniform target class constraint to avoid biased attack directions, which distribute the attack targets across all categories and prevents overfitting to specific vulnerable categories. Extensive experiments on three public datasets show that FERD achieves state-of-the-art worst-class robustness under all adversarial attack (e.g., the worst-class robustness under FGSM and AutoAttack are improved by 15.1\% and 6.4\% using MobileNet-V2 on CIFAR-10), demonstrating superior performance in both robustness and fairness aspects.


Key Contributions

  • Identifies and characterizes robust fairness disparity across categories in data-free robustness distillation, showing class-wise robustness gaps are amplified during distillation
  • Introduces Fairness-Aware Examples (FAEs) via a uniformity constraint on feature-level predictions to suppress class-specific non-robust features and balance representation across categories
  • Proposes Uniform-Target Adversarial Examples (UTAEs) constructed from FAEs with a uniform target class constraint, distributing attack directions evenly across categories to prevent overfitting to vulnerable classes

🛡️ Threat Analysis

Input Manipulation Attack

Paper's primary contribution is a defense against adversarial input manipulation attacks (FGSM, PGD, CW, AutoAttack) — it proposes a distillation framework that improves adversarial robustness of student models by generating fairer distributions of adversarial examples during training, directly defending against inference-time input perturbation attacks.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_timetargeteduntargeted
Datasets
CIFAR-10CIFAR-100Tiny-ImageNet
Applications
image classification