Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai , Gauri Pradhan , Marlon Tobaben , Antti Honkela
Published on arXiv
2510.05753
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
LiRA achieves superior performance across most transfer learning scenarios, but the white-box Inverse Hessian Attack (IHA) outperforms all score-based black-box MIAs on PatchCamelyon in the high-data regime, with no single attack capturing all privacy risks.
With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications. Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible attacks. We address this by comparing performance of diverse MIAs in transfer learning settings to help practitioners identify the most efficient attacks for privacy risk evaluation. We find that attack efficacy decreases with the increase in training data for score-based MIAs. We find that there is no one MIA which captures all privacy risks in models trained with transfer learning. While the Likelihood Ratio Attack (LiRA) demonstrates superior performance across most experimental scenarios, the Inverse Hessian Attack (IHA) proves to be more effective against models fine-tuned on PatchCamelyon dataset in high data regime.
Key Contributions
- Systematic comparison of diverse score-based MIAs (LiRA, IHA, TrajectoryMIA, QuantileMIA, etc.) in transfer learning settings with consistent experimental setups and hyperparameter optimization
- Demonstrates that MIA efficacy generally decreases as fine-tuning dataset size increases for most score-based attacks, consistent with a power-law relationship
- Identifies that no single MIA captures all privacy risks in transfer learning — LiRA is best overall, but the white-box IHA outperforms black-box methods in the high-shot regime on PatchCamelyon
🛡️ Threat Analysis
The paper is entirely focused on comparing the efficacy of membership inference attacks (LOSS, MLLeaks, LiRA, IHA, TrajectoryMIA, QuantileMIA, SeqMIA) in the context of fine-tuned transfer learning models — this is directly and specifically ML04.