defense 2026

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

Yifan Zhu 1,2, Yibo Miao 1,2, Yinpeng Dong 3,4, Xiao-Shan Gao 1,2

0 citations

α

Published on arXiv

2603.03725

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

MI-UE significantly outperforms prior unlearnable example methods in preventing unauthorized model generalization, remaining effective even under known defense mechanisms

MI-UE (Mutual Information Unlearnable Examples)

Novel technique introduced


The volume of freely scraped data on the Internet has driven the tremendous success of deep learning. Along with this comes the growing concern about data privacy and security. Numerous methods for generating unlearnable examples have been proposed to prevent data from being illicitly learned by unauthorized deep models by impeding generalization. However, the existing approaches primarily rely on empirical heuristics, making it challenging to enhance unlearnable examples with solid explanations. In this paper, we analyze and improve unlearnable examples from a novel perspective: mutual information reduction. We demonstrate that effective unlearnable examples always decrease mutual information between clean features and poisoned features, and when the network gets deeper, the unlearnability goes better together with lower mutual information. Further, we prove from a covariance reduction perspective that minimizing the conditional covariance of intra-class poisoned features reduces the mutual information between distributions. Based on the theoretical results, we propose a novel unlearnable method called Mutual Information Unlearnable Examples (MI-UE) that reduces covariance by maximizing the cosine similarity among intra-class features, thus impeding the generalization effectively. Extensive experiments demonstrate that our approach significantly outperforms the previous methods, even under defense mechanisms.


Key Contributions

  • Theoretical framework showing that effective unlearnable examples reduce mutual information between clean and poisoned feature distributions, with mutual information as an upper bound on generalization error
  • Covariance reduction proof demonstrating that minimizing conditional covariance of intra-class poisoned features reduces mutual information between distributions
  • MI-UE method that maximizes cosine similarity among intra-class features to minimize covariance and impede generalization, outperforming prior methods even under defenses

🛡️ Threat Analysis

Data Poisoning Attack

MI-UE crafts imperceptible perturbations added to training data to degrade generalization of any unauthorized model trained on it — this is defensive availability poisoning, directly within the data poisoning threat model.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timewhite_boxdigitaluntargeted
Datasets
CIFAR-10ImageNet
Applications
image classificationdata protection from unauthorized training