defense 2025

FERD: Fairness-Enhanced Data-Free Robustness Distillation

0 citations · 49 references · arXiv

Published on arXiv

2509.20793

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Improves worst-class robustness against FGSM and AutoAttack by 15.1% and 6.4% respectively using MobileNet-V2 on CIFAR-10 over the baseline data-free robustness distillation method.

FERD (Fairness-Enhanced Data-Free Robustness Distillation)

Novel technique introduced

Data-Free Robustness Distillation (DFRD) aims to transfer the robustness from the teacher to the student without accessing the training data. While existing methods focus on overall robustness, they overlook the robust fairness issues, leading to severe disparity of robustness across different categories. In this paper, we find two key problems: (1) student model distilled with equal class proportion data behaves significantly different across distinct categories; and (2) the robustness of student model is not stable across different attacks target. To bridge these gaps, we present the first Fairness-Enhanced data-free Robustness Distillation (FERD) framework to adjust the proportion and distribution of adversarial examples. For the proportion, FERD adopts a robustness-guided class reweighting strategy to synthesize more samples for the less robust categories, thereby improving robustness of them. For the distribution, FERD generates complementary data samples for advanced robustness distillation. It generates Fairness-Aware Examples (FAEs) by enforcing a uniformity constraint on feature-level predictions, which suppress the dominance of class-specific non-robust features, providing a more balanced representation across all categories. Then, FERD constructs Uniform-Target Adversarial Examples (UTAEs) from FAEs by applying a uniform target class constraint to avoid biased attack directions, which distribute the attack targets across all categories and prevents overfitting to specific vulnerable categories. Extensive experiments on three public datasets show that FERD achieves state-of-the-art worst-class robustness under all adversarial attack (e.g., the worst-class robustness under FGSM and AutoAttack are improved by 15.1\% and 6.4\% using MobileNet-V2 on CIFAR-10), demonstrating superior performance in both robustness and fairness aspects.

Key Contributions

Identifies and characterizes robust fairness disparity across categories in data-free robustness distillation, showing class-wise robustness gaps are amplified during distillation
Introduces Fairness-Aware Examples (FAEs) via a uniformity constraint on feature-level predictions to suppress class-specific non-robust features and balance representation across categories
Proposes Uniform-Target Adversarial Examples (UTAEs) constructed from FAEs with a uniform target class constraint, distributing attack directions evenly across categories to prevent overfitting to vulnerable classes

🛡️ Threat Analysis

Input Manipulation Attack

Paper's primary contribution is a defense against adversarial input manipulation attacks (FGSM, PGD, CW, AutoAttack) — it proposes a distillation framework that improves adversarial robustness of student models by generating fairer distributions of adversarial examples during training, directly defending against inference-time input perturbation attacks.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxinference_timetargeteduntargeted

Datasets

CIFAR-10CIFAR-100Tiny-ImageNet

Applications

image classification

Read PDF arXiv DOI

FERD: Fairness-Enhanced Data-Free Robustness Distillation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Verifying DNN-based Semantic Communication Against Generative Adversarial Noise

Rectifying Adversarial Examples Using Their Vulnerabilities

DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks

Quadratic Upper Bound for Boosting Robustness

Fast and Flexible Robustness Certificates for Semantic Segmentation

From Decoupled to Coupled: Robustness Verification for Learning-based Keypoint Detection with Joint Specifications

Robust Spiking Neural Networks Against Adversarial Attacks

Robust Convolution Neural ODEs via Contractivity-promoting regularization