Efficient Unlearning through Maximizing Relearning Convergence Delay
Khoa Tran 1, Simon S. Woo 1,2
Published on arXiv
2604.09391
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Achieves longest relearning convergence delay among compared methods while maintaining retention accuracy, demonstrating stronger resistance to data recovery attacks
Influence Eliminating Unlearning
Novel technique introduced
Machine unlearning poses challenges in removing mislabeled, contaminated, or problematic data from a pretrained model. Current unlearning approaches and evaluation metrics are solely focused on model predictions, which limits insight into the model's true underlying data characteristics. To address this issue, we introduce a new metric called relearning convergence delay, which captures both changes in weight space and prediction space, providing a more comprehensive assessment of the model's understanding of the forgotten dataset. This metric can be used to assess the risk of forgotten data being recovered from the unlearned model. Based on this, we propose the Influence Eliminating Unlearning framework, which removes the influence of the forgetting set by degrading its performance and incorporates weight decay and injecting noise into the model's weights, while maintaining accuracy on the retaining set. Extensive experiments show that our method outperforms existing metrics and our proposed relearning convergence delay metric, approaching ideal unlearning performance. We provide theoretical guarantees, including exponential convergence and upper bounds, as well as empirical evidence of strong retention and resistance to relearning in both classification and generative unlearning tasks.
Key Contributions
- Novel 'relearning convergence delay' metric measuring how long it takes to relearn forgotten data from unlearned models
- Influence Eliminating Unlearning framework combining performance degradation on forgetting set with weight decay and noise injection
- Theoretical guarantees (exponential convergence, upper bounds) and empirical validation on classification and generative tasks
🛡️ Threat Analysis
The paper explicitly frames the threat model as adversaries attempting to RECOVER forgotten data from the unlearned model through relearning. The 'relearning convergence delay' metric measures resistance to data reconstruction attacks. Unlike pure compliance-focused unlearning, this paper evaluates the security property of preventing data recovery.