Mitigating Data Exfiltration Attacks through Layer-Wise Learning Rate Decay Fine-Tuning
Elie Thellier , Huiyu Li , Nicholas Ayache , Hervé Delingette
Published on arXiv
2509.00027
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
LWLRD FT outperforms Vanilla FT, High LR FT, and Super-Fine-Tuning baselines in disrupting Transpose and DEC exfiltration attacks while maintaining classification utility on medical imaging benchmarks, rendering exfiltrated data unusable for downstream training.
Layer-Wise Learning Rate Decay Fine-Tuning (LWLRD FT)
Novel technique introduced
Data lakes enable the training of powerful machine learning models on sensitive, high-value medical datasets, but also introduce serious privacy risks due to potential leakage of protected health information. Recent studies show adversaries can exfiltrate training data by embedding latent representations into model parameters or inducing memorization via multi-task learning. These attacks disguise themselves as benign utility models while enabling reconstruction of high-fidelity medical images, posing severe privacy threats with legal and ethical implications. In this work, we propose a simple yet effective mitigation strategy that perturbs model parameters at export time through fine-tuning with a decaying layer-wise learning rate to corrupt embedded data without degrading task performance. Evaluations on DermaMNIST, ChestMNIST, and MIMIC-CXR show that our approach maintains utility task performance, effectively disrupts state-of-the-art exfiltration attacks, outperforms prior defenses, and renders exfiltrated data unusable for training. Ablations and discussions on adaptive attacks highlight challenges and future directions. Our findings offer a practical defense against data leakage in data lake-trained models and centralized federated learning.
Key Contributions
- Layer-Wise Learning Rate Decay Fine-Tuning (LWLRD FT): a post-training parameter perturbation scheme that applies a decaying learning rate across layers at export time to corrupt embedded exfiltration payloads without degrading classification utility
- First systematic evaluation of mitigation strategies specifically targeting neural network-based data exfiltration attacks (Transpose, DEC) on three medical imaging datasets (DermaMNIST, ChestMNIST, MIMIC-CXR)
- Ablation study and adaptive adversary analysis characterizing the utility-privacy trade-off and open challenges for data lake model export security
🛡️ Threat Analysis
The attacks defended against (Transpose, DEC) reconstruct private training data from model parameters — Transpose uses reversible networks to memorize images, DEC steganographically embeds training data in model weights. The proposed defense (LWLRD FT) disrupts this reconstruction by perturbing early-layer weights at export time, directly targeting the training-data reconstruction threat model.