Robust Online Learning | ML Security Papers

We study the problem of learning robust classifiers where the classifier will receive a perturbed input. Unlike robust PAC learning studied in prior work, here the clean data and its label are also adversarially chosen. We formulate this setting as an online learning problem and consider both the realizable and agnostic learnability of hypothesis classes. We define a new dimension of classes and show it controls the mistake bounds in the realizable setting and the regret bounds in the agnostic setting. In contrast to the dimension that characterizes learnability in the PAC setting, our dimension is rather simple and resembles the Littlestone dimension. We generalize our dimension to multiclass hypothesis classes and prove similar results in the realizable case. Finally, we study the case where the learner does not know the set of allowed perturbations for each point and only has some prior on them.

Key Contributions

Formulates robust online learning as an interactive adversarial game where both clean data/labels and input perturbations are adversarially chosen
Defines a new combinatorial dimension resembling the Littlestone dimension that yields exact mistake bounds in the realizable setting and O(sqrt(dT)) regret in the agnostic setting
Generalizes the dimension to multiclass hypothesis classes and proves results for uncertain perturbation sets known only up to a finite family

🛡️ Threat Analysis

Input Manipulation Attack

The core problem is learning classifiers resilient to adversarial input perturbations at inference time — an adversary chooses perturbed inputs to cause misprediction, directly addressing the theoretical foundations of adversarial example robustness.