defense 2025

Towards Provably Unlearnable Examples via Bayes Error Optimisation

Ruihan Zhang , Jun Sun , Ee-Peng Lim , Peixin Zhang

0 citations · 36 references · arXiv

α

Published on arXiv

2511.08191

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

In a 50/50 clean-unlearnable mix on CIFAR-10, prior methods boost test accuracy to 92.51% (defeating their purpose), while BEO drops it to 69.68% versus 91.16% for clean-only training

Bayes Error Optimisation (BEO)

Novel technique introduced


The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.


Key Contributions

  • Novel unlearnable-examples framework grounded in Bayes error maximization via projected gradient ascent, providing formal theoretical guarantees unlike prior heuristic methods
  • Provable increase in irreducible classification error under norm-bounded perturbation constraints
  • Demonstrated effectiveness when unlearnable examples are mixed with clean data — resolving a critical practical limitation of existing methods

🛡️ Threat Analysis

Data Poisoning Attack

Unlearnable examples are a defensive form of training-data corruption — the method deliberately perturbs data at training time to degrade model learnability, operating through the same vector as data poisoning (manipulating training data to cause poor model performance). The threat model has an unauthorized ML trainer as the adversary and the data owner as defender.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_time
Datasets
CIFAR-10
Applications
image classificationdata protection against unauthorized ml training