Efficient Unlearning through Maximizing Relearning Convergence Delay

Machine unlearning poses challenges in removing mislabeled, contaminated, or problematic data from a pretrained model. Current unlearning approaches and evaluation metrics are solely focused on model predictions, which limits insight into the model's true underlying data characteristics. To address this issue, we introduce a new metric called relearning convergence delay, which captures both changes in weight space and prediction space, providing a more comprehensive assessment of the model's understanding of the forgotten dataset. This metric can be used to assess the risk of forgotten data being recovered from the unlearned model. Based on this, we propose the Influence Eliminating Unlearning framework, which removes the influence of the forgetting set by degrading its performance and incorporates weight decay and injecting noise into the model's weights, while maintaining accuracy on the retaining set. Extensive experiments show that our method outperforms existing metrics and our proposed relearning convergence delay metric, approaching ideal unlearning performance. We provide theoretical guarantees, including exponential convergence and upper bounds, as well as empirical evidence of strong retention and resistance to relearning in both classification and generative unlearning tasks.

Key Contributions

Novel 'relearning convergence delay' metric measuring how long it takes to relearn forgotten data from unlearned models
Influence Eliminating Unlearning framework combining performance degradation on forgetting set with weight decay and noise injection
Theoretical guarantees (exponential convergence, upper bounds) and empirical validation on classification and generative tasks

🛡️ Threat Analysis

Model Inversion Attack

The paper explicitly frames the threat model as adversaries attempting to RECOVER forgotten data from the unlearned model through relearning. The 'relearning convergence delay' metric measures resistance to data reconstruction attacks. Unlike pure compliance-focused unlearning, this paper evaluates the security property of preventing data recovery.