rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks
Published on arXiv
2603.17628
Input Manipulation Attack
OWASP ML Top 10 — ML01
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Achieves improved robustness to both label noise and adversarial attacks (FGSM, PGD) while maintaining competitive accuracy on clean data across three benchmark datasets
rSDNet
Novel technique introduced
Neural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to maximum likelihood estimation under a multinomial model. While statistically efficient under ideal conditions, this approach is highly vulnerable to contaminated observations including label noises corrupting supervision in the output space, and adversarial perturbations inducing worst-case deviations in the input space. In this paper, we propose a unified and statistically grounded framework for robust neural classification that addresses both forms of contamination within a single learning objective. We formulate neural network training as a minimum-divergence estimation problem and introduce rSDNet, a robust learning algorithm based on the general class of $S$-divergences. The resulting training objective inherits robustness properties from classical statistical estimation, automatically down-weighting aberrant observations through model probabilities. We establish essential population-level properties of rSDNet, including Fisher consistency, classification calibration implying Bayes optimality, and robustness guarantees under uniform label noise and infinitesimal feature contamination. Experiments on three benchmark image classification datasets show that rSDNet improves robustness to label corruption and adversarial attacks while maintaining competitive accuracy on clean data, Our results highlight minimum-divergence learning as a principled and effective framework for robust neural classification under heterogeneous data contamination.
Key Contributions
- Unified minimum-divergence learning framework (rSDNet) addressing both label noise and adversarial perturbations in a single objective
- Theoretical guarantees including Fisher consistency, classification calibration, and robustness under uniform label noise and infinitesimal feature contamination
- Empirical validation showing improved robustness to label corruption and adversarial attacks while maintaining competitive clean accuracy
🛡️ Threat Analysis
Paper addresses adversarial perturbations in input space causing misclassification at inference time. Evaluates robustness against adversarial attacks (FGSM, PGD) and proposes a defense via robust loss function.
Paper addresses label noise corrupting training data in output space. Proposes a unified framework that down-weights mislabeled samples during training, defending against data poisoning through noisy labels.