Towards Provably Unlearnable Examples via Bayes Error Optimisation

The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.

Key Contributions

Novel unlearnable-examples framework grounded in Bayes error maximization via projected gradient ascent, providing formal theoretical guarantees unlike prior heuristic methods
Provable increase in irreducible classification error under norm-bounded perturbation constraints
Demonstrated effectiveness when unlearnable examples are mixed with clean data — resolving a critical practical limitation of existing methods

🛡️ Threat Analysis

Data Poisoning Attack

Unlearnable examples are a defensive form of training-data corruption — the method deliberately perturbs data at training time to degrade model learnability, operating through the same vector as data poisoning (manipulating training data to cause poor model performance). The threat model has an unauthorized ML trainer as the adversary and the data owner as defender.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_time

Datasets

CIFAR-10

Applications

2025 2 cit.

Data Poisoning Attack

73%

Towards Provably Unlearnable Examples via Bayes Error Optimisation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Learning to Forget with Information Divergence Reweighted Objectives for Noisy Labels

Learning to Generate Cross-Task Unexploitable Examples

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers

Provable Watermarking for Data Poisoning Attacks

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

How Far Are We from True Unlearnability?

Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy