$\oslash$ Source Models Leak What They Shouldn't $\nrightarrow$: Unlearning Zero-Shot Transfer in Domain Adaptation Through Adversarial Optimization
Arnav Devalapally 1,2, Poornima Jain 1, Kartik Srinivas 1,3, Vineeth N. Balasubramanian 1,4
Published on arXiv
2604.08238
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Achieves retraining-level unlearning performance while preventing zero-shot transfer of source-exclusive classes during domain adaptation
SCADA-UL
Novel technique introduced
The increasing adaptation of vision models across domains, such as satellite imagery and medical scans, has raised an emerging privacy risk: models may inadvertently retain and leak sensitive source-domain specific information in the target domain. This creates a compelling use case for machine unlearning to protect the privacy of sensitive source-domain data. Among adaptation techniques, source-free domain adaptation (SFDA) calls for an urgent need for machine unlearning (MU), where the source data itself is protected, yet the source model exposed during adaptation encodes its influence. Our experiments reveal that existing SFDA methods exhibit strong zero-shot performance on source-exclusive classes in the target domain, indicating they inadvertently leak knowledge of these classes into the target domain, even when they are not represented in the target data. We identify and address this risk by proposing an MU setting called SCADA-UL: Unlearning Source-exclusive ClAsses in Domain Adaptation. Existing MU methods do not address this setting as they are not designed to handle data distribution shifts. We propose a new unlearning method, where an adversarially generated forget class sample is unlearned by the model during the domain adaptation process using a novel rescaled labeling strategy and adversarial optimization. We also extend our study to two variants: a continual version of this problem setting and to one where the specific source classes to be forgotten may be unknown. Alongside theoretical interpretations, our comprehensive empirical results show that our method consistently outperforms baselines in the proposed setting while achieving retraining-level unlearning performance on benchmark datasets. Our code is available at https://github.com/D-Arnav/SCADA
Key Contributions
- Identifies privacy risk in source-free domain adaptation where models leak source-exclusive class knowledge via zero-shot transfer
- Proposes SCADA-UL, a new machine unlearning setting for domain adaptation with distribution shift
- Adversarial optimization method with rescaled labeling strategy that achieves retraining-level unlearning performance
🛡️ Threat Analysis
The paper addresses a privacy leakage problem where source models inadvertently retain and leak sensitive source-domain specific information (class knowledge) into the target domain during domain adaptation. The threat model involves an adversary exploiting zero-shot transfer to infer knowledge about source-exclusive classes. The proposed unlearning method is explicitly evaluated against this data leakage threat.