rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

Neural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to maximum likelihood estimation under a multinomial model. While statistically efficient under ideal conditions, this approach is highly vulnerable to contaminated observations including label noises corrupting supervision in the output space, and adversarial perturbations inducing worst-case deviations in the input space. In this paper, we propose a unified and statistically grounded framework for robust neural classification that addresses both forms of contamination within a single learning objective. We formulate neural network training as a minimum-divergence estimation problem and introduce rSDNet, a robust learning algorithm based on the general class of $S$-divergences. The resulting training objective inherits robustness properties from classical statistical estimation, automatically down-weighting aberrant observations through model probabilities. We establish essential population-level properties of rSDNet, including Fisher consistency, classification calibration implying Bayes optimality, and robustness guarantees under uniform label noise and infinitesimal feature contamination. Experiments on three benchmark image classification datasets show that rSDNet improves robustness to label corruption and adversarial attacks while maintaining competitive accuracy on clean data, Our results highlight minimum-divergence learning as a principled and effective framework for robust neural classification under heterogeneous data contamination.

Key Contributions

Unified minimum-divergence learning framework (rSDNet) addressing both label noise and adversarial perturbations in a single objective
Theoretical guarantees including Fisher consistency, classification calibration, and robustness under uniform label noise and infinitesimal feature contamination
Empirical validation showing improved robustness to label corruption and adversarial attacks while maintaining competitive clean accuracy

🛡️ Threat Analysis

Input Manipulation Attack

Paper addresses adversarial perturbations in input space causing misclassification at inference time. Evaluates robustness against adversarial attacks (FGSM, PGD) and proposes a defense via robust loss function.

Data Poisoning Attack

Paper addresses label noise corrupting training data in output space. Proposes a unified framework that down-weights mislabeled samples during training, defending against data poisoning through noisy labels.