attack 2025

Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten

Wei Qian 1, Chenxu Zhao 1, Yangyi Li 1, Wenqian Ye 2, Mengdi Huai 1

0 citations

α

Published on arXiv

2508.07458

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Attacks targeting predictive uncertainty via malicious unlearning requests are more effective than traditional label-misclassification attacks and evade all evaluated conventional defenses

Malicious Unlearning Attack on Predictive Uncertainty

Novel technique introduced


Currently, various uncertainty quantification methods have been proposed to provide certainty and probability estimates for deep learning models' label predictions. Meanwhile, with the growing demand for the right to be forgotten, machine unlearning has been extensively studied as a means to remove the impact of requested sensitive data from a pre-trained model without retraining the model from scratch. However, the vulnerabilities of such generated predictive uncertainties with regard to dedicated malicious unlearning attacks remain unexplored. To bridge this gap, for the first time, we propose a new class of malicious unlearning attacks against predictive uncertainties, where the adversary aims to cause the desired manipulations of specific predictive uncertainty results. We also design novel optimization frameworks for our attacks and conduct extensive experiments, including black-box scenarios. Notably, our extensive experiments show that our attacks are more effective in manipulating predictive uncertainties than traditional attacks that focus on label misclassifications, and existing defenses against conventional attacks are ineffective against our attacks.


Key Contributions

  • First study of malicious unlearning attacks specifically targeting predictive uncertainty outputs (e.g., Bayesian neural networks, softmax confidence, deep ensembles) rather than label predictions
  • Novel optimization frameworks for crafting malicious unlearning requests that manipulate desired uncertainty results while preserving original predicted labels for stealthiness
  • Empirical demonstration that uncertainty-targeted unlearning attacks outperform traditional label-misclassification attacks and bypass existing defenses, including in black-box settings

🛡️ Threat Analysis

Data Poisoning Attack

The adversary submits crafted malicious unlearning requests — analogous to poisoning the model update process — to corrupt predictive uncertainty outputs while preserving label predictions. This is a training/update-time poisoning attack that degrades model behavior through a controlled data removal interface, making it a form of data/model poisoning exploitation.


Details

Domains
vision
Model Types
cnntransformertraditional_ml
Threat Tags
black_boxtraining_timetargeted
Datasets
CIFAR-10MNIST
Applications
image classificationuncertainty quantification