defense 2025

Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic

Mostafa Mozafari , Farooq Ahmad Wani , Maria Sofia Bucarelli , Fabrizio Silvestri

0 citations · 71 references · arXiv

α

Published on arXiv

2511.18660

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

CUTS nearly eliminates backdoor attack success rates and recovers a large fraction of label-noise utility without access to clean data or original training samples, outperforming state-of-the-art source-free CMU methods.

CUTS (Corrective Unlearning in Task Space)

Novel technique introduced


Corrupted training data are ubiquitous. Corrective Machine Unlearning (CMU) seeks to remove the influence of such corruption post-training. Prior CMU typically assumes access to identified corrupted training samples (a "forget set"). However, in many real-world scenarios the training data are no longer accessible. We formalize source-free CMU, where the original training data are unavailable and, consequently, no forget set of identified corrupted training samples can be specified. Instead, we assume a small proxy (surrogate) set of corrupted samples that reflect the suspected corruption type without needing to be the original training samples. In this stricter setting, methods relying on forget set are ineffective or narrow in scope. We introduce Corrective Unlearning in Task Space (CUTS), a lightweight weight space correction method guided by the proxy set using task arithmetic principles. CUTS treats the clean and the corruption signal as distinct tasks. Specifically, we briefly fine-tune the corrupted model on the proxy to amplify the corruption mechanism in the weight space, compute the difference between the corrupted and fine-tuned weights as a proxy task vector, and subtract a calibrated multiple of this vector to cancel the corruption. Without access to clean data or a forget set, CUTS recovers a large fraction of the lost utility under label noise and, for backdoor triggers, nearly eliminates the attack with minimal damage to utility, outperforming state-of-the-art specialized CMU methods in source-free setting.


Key Contributions

  • Formalizes source-free Corrective Machine Unlearning (CMU) where neither the original training data nor a labeled forget set is available — only a small proxy set of corrupted samples
  • Introduces CUTS (Corrective Unlearning in Task Space), which fine-tunes the corrupted model on the proxy to amplify corruption in weight space, computes a proxy task vector, and subtracts a calibrated multiple to cancel the corruption signal
  • Demonstrates CUTS nearly eliminates backdoor attacks and significantly recovers utility under label noise without clean data, outperforming specialized state-of-the-art CMU baselines in the source-free setting

🛡️ Threat Analysis

Data Poisoning Attack

Label noise / label-flipping corruption is the other primary experimental setting explicitly addressed — the paper defends against training-time data poisoning (corrupted labels) and recovers a large fraction of lost utility, a direct ML02 defense contribution.

Model Poisoning

CUTS directly defends against backdoor/trojan attacks by amplifying the corruption in weight space via proxy fine-tuning, computing a proxy task vector, and subtracting it to cancel the backdoor — the paper reports 'nearly eliminates the attack' for backdoor triggers, making backdoor removal a primary contribution.


Details

Domains
vision
Model Types
transformer
Threat Tags
training_timetargeted
Applications
image classification