Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic
Mostafa Mozafari , Farooq Ahmad Wani , Maria Sofia Bucarelli , Fabrizio Silvestri
Published on arXiv
2511.18660
Model Poisoning
OWASP ML Top 10 — ML10
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
CUTS nearly eliminates backdoor attack success rates and recovers a large fraction of label-noise utility without access to clean data or original training samples, outperforming state-of-the-art source-free CMU methods.
CUTS (Corrective Unlearning in Task Space)
Novel technique introduced
Corrupted training data are ubiquitous. Corrective Machine Unlearning (CMU) seeks to remove the influence of such corruption post-training. Prior CMU typically assumes access to identified corrupted training samples (a "forget set"). However, in many real-world scenarios the training data are no longer accessible. We formalize source-free CMU, where the original training data are unavailable and, consequently, no forget set of identified corrupted training samples can be specified. Instead, we assume a small proxy (surrogate) set of corrupted samples that reflect the suspected corruption type without needing to be the original training samples. In this stricter setting, methods relying on forget set are ineffective or narrow in scope. We introduce Corrective Unlearning in Task Space (CUTS), a lightweight weight space correction method guided by the proxy set using task arithmetic principles. CUTS treats the clean and the corruption signal as distinct tasks. Specifically, we briefly fine-tune the corrupted model on the proxy to amplify the corruption mechanism in the weight space, compute the difference between the corrupted and fine-tuned weights as a proxy task vector, and subtract a calibrated multiple of this vector to cancel the corruption. Without access to clean data or a forget set, CUTS recovers a large fraction of the lost utility under label noise and, for backdoor triggers, nearly eliminates the attack with minimal damage to utility, outperforming state-of-the-art specialized CMU methods in source-free setting.
Key Contributions
- Formalizes source-free Corrective Machine Unlearning (CMU) where neither the original training data nor a labeled forget set is available — only a small proxy set of corrupted samples
- Introduces CUTS (Corrective Unlearning in Task Space), which fine-tunes the corrupted model on the proxy to amplify corruption in weight space, computes a proxy task vector, and subtracts a calibrated multiple to cancel the corruption signal
- Demonstrates CUTS nearly eliminates backdoor attacks and significantly recovers utility under label noise without clean data, outperforming specialized state-of-the-art CMU baselines in the source-free setting
🛡️ Threat Analysis
Label noise / label-flipping corruption is the other primary experimental setting explicitly addressed — the paper defends against training-time data poisoning (corrupted labels) and recovers a large fraction of lost utility, a direct ML02 defense contribution.
CUTS directly defends against backdoor/trojan attacks by amplifying the corruption in weight space via proxy fine-tuning, computing a proxy task vector, and subtracting it to cancel the backdoor — the paper reports 'nearly eliminates the attack' for backdoor triggers, making backdoor removal a primary contribution.