defense 2026

Efficient Unlearning through Maximizing Relearning Convergence Delay

Khoa Tran 1, Simon S. Woo 1,2

0 citations

α

Published on arXiv

2604.09391

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Achieves longest relearning convergence delay among compared methods while maintaining retention accuracy, demonstrating stronger resistance to data recovery attacks

Influence Eliminating Unlearning

Novel technique introduced


Machine unlearning poses challenges in removing mislabeled, contaminated, or problematic data from a pretrained model. Current unlearning approaches and evaluation metrics are solely focused on model predictions, which limits insight into the model's true underlying data characteristics. To address this issue, we introduce a new metric called relearning convergence delay, which captures both changes in weight space and prediction space, providing a more comprehensive assessment of the model's understanding of the forgotten dataset. This metric can be used to assess the risk of forgotten data being recovered from the unlearned model. Based on this, we propose the Influence Eliminating Unlearning framework, which removes the influence of the forgetting set by degrading its performance and incorporates weight decay and injecting noise into the model's weights, while maintaining accuracy on the retaining set. Extensive experiments show that our method outperforms existing metrics and our proposed relearning convergence delay metric, approaching ideal unlearning performance. We provide theoretical guarantees, including exponential convergence and upper bounds, as well as empirical evidence of strong retention and resistance to relearning in both classification and generative unlearning tasks.


Key Contributions

  • Novel 'relearning convergence delay' metric measuring how long it takes to relearn forgotten data from unlearned models
  • Influence Eliminating Unlearning framework combining performance degradation on forgetting set with weight decay and noise injection
  • Theoretical guarantees (exponential convergence, upper bounds) and empirical validation on classification and generative tasks

🛡️ Threat Analysis

Model Inversion Attack

The paper explicitly frames the threat model as adversaries attempting to RECOVER forgotten data from the unlearned model through relearning. The 'relearning convergence delay' metric measures resistance to data reconstruction attacks. Unlike pure compliance-focused unlearning, this paper evaluates the security property of preventing data recovery.


Details

Domains
vision
Model Types
cnndiffusion
Threat Tags
training_time
Datasets
CIFAR-10CIFAR-100CelebA
Applications
image classificationgenerative modeling