Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
Yifan Zhu 1,2, Yibo Miao 1,2, Yinpeng Dong 3,4, Xiao-Shan Gao 1,2
Published on arXiv
2603.03725
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
MI-UE significantly outperforms prior unlearnable example methods in preventing unauthorized model generalization, remaining effective even under known defense mechanisms
MI-UE (Mutual Information Unlearnable Examples)
Novel technique introduced
The volume of freely scraped data on the Internet has driven the tremendous success of deep learning. Along with this comes the growing concern about data privacy and security. Numerous methods for generating unlearnable examples have been proposed to prevent data from being illicitly learned by unauthorized deep models by impeding generalization. However, the existing approaches primarily rely on empirical heuristics, making it challenging to enhance unlearnable examples with solid explanations. In this paper, we analyze and improve unlearnable examples from a novel perspective: mutual information reduction. We demonstrate that effective unlearnable examples always decrease mutual information between clean features and poisoned features, and when the network gets deeper, the unlearnability goes better together with lower mutual information. Further, we prove from a covariance reduction perspective that minimizing the conditional covariance of intra-class poisoned features reduces the mutual information between distributions. Based on the theoretical results, we propose a novel unlearnable method called Mutual Information Unlearnable Examples (MI-UE) that reduces covariance by maximizing the cosine similarity among intra-class features, thus impeding the generalization effectively. Extensive experiments demonstrate that our approach significantly outperforms the previous methods, even under defense mechanisms.
Key Contributions
- Theoretical framework showing that effective unlearnable examples reduce mutual information between clean and poisoned feature distributions, with mutual information as an upper bound on generalization error
- Covariance reduction proof demonstrating that minimizing conditional covariance of intra-class poisoned features reduces mutual information between distributions
- MI-UE method that maximizes cosine similarity among intra-class features to minimize covariance and impede generalization, outperforming prior methods even under defenses
🛡️ Threat Analysis
MI-UE crafts imperceptible perturbations added to training data to degrade generalization of any unauthorized model trained on it — this is defensive availability poisoning, directly within the data poisoning threat model.