defense 2025

Mitigating Data Exfiltration Attacks through Layer-Wise Learning Rate Decay Fine-Tuning

Elie Thellier , Huiyu Li , Nicholas Ayache , Hervé Delingette

0 citations · 6th MICCAI Workshop on "Distri...

α

Published on arXiv

2509.00027

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

LWLRD FT outperforms Vanilla FT, High LR FT, and Super-Fine-Tuning baselines in disrupting Transpose and DEC exfiltration attacks while maintaining classification utility on medical imaging benchmarks, rendering exfiltrated data unusable for downstream training.

Layer-Wise Learning Rate Decay Fine-Tuning (LWLRD FT)

Novel technique introduced


Data lakes enable the training of powerful machine learning models on sensitive, high-value medical datasets, but also introduce serious privacy risks due to potential leakage of protected health information. Recent studies show adversaries can exfiltrate training data by embedding latent representations into model parameters or inducing memorization via multi-task learning. These attacks disguise themselves as benign utility models while enabling reconstruction of high-fidelity medical images, posing severe privacy threats with legal and ethical implications. In this work, we propose a simple yet effective mitigation strategy that perturbs model parameters at export time through fine-tuning with a decaying layer-wise learning rate to corrupt embedded data without degrading task performance. Evaluations on DermaMNIST, ChestMNIST, and MIMIC-CXR show that our approach maintains utility task performance, effectively disrupts state-of-the-art exfiltration attacks, outperforms prior defenses, and renders exfiltrated data unusable for training. Ablations and discussions on adaptive attacks highlight challenges and future directions. Our findings offer a practical defense against data leakage in data lake-trained models and centralized federated learning.


Key Contributions

  • Layer-Wise Learning Rate Decay Fine-Tuning (LWLRD FT): a post-training parameter perturbation scheme that applies a decaying learning rate across layers at export time to corrupt embedded exfiltration payloads without degrading classification utility
  • First systematic evaluation of mitigation strategies specifically targeting neural network-based data exfiltration attacks (Transpose, DEC) on three medical imaging datasets (DermaMNIST, ChestMNIST, MIMIC-CXR)
  • Ablation study and adaptive adversary analysis characterizing the utility-privacy trade-off and open challenges for data lake model export security

🛡️ Threat Analysis

Model Inversion Attack

The attacks defended against (Transpose, DEC) reconstruct private training data from model parameters — Transpose uses reversible networks to memorize images, DEC steganographically embeds training data in model weights. The proposed defense (LWLRD FT) disrupts this reconstruction by perturbing early-layer weights at export time, directly targeting the training-data reconstruction threat model.


Details

Domains
vision
Model Types
cnn
Threat Tags
training_timewhite_box
Datasets
DermaMNISTChestMNISTMIMIC-CXR
Applications
medical image classificationdata lake securitycentralized federated learning