Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

Key Contributions

LM-DP-SGD: a DP-SGD variant that estimates per-layer MIA risk via shadow-model adversaries and reweights gradient clipping proportionally to layer vulnerability under a fixed noise budget
Theoretical guarantees on both privacy (differential privacy) and convergence for the proposed layer-wise reweighted clipping scheme
Empirical demonstration that LM-DP-SGD achieves a superior privacy-utility trade-off compared to uniform DP-SGD by reducing peak IR-level MIA risk while preserving model utility

🛡️ Threat Analysis

Membership Inference Attack

The paper's primary contribution is a defense against Membership Inference Attacks (MIAs) on intermediate representations in EaaI settings; it trains layer-specific MIA adversaries to estimate per-layer vulnerability and uses those estimates to guide adaptive DP-SGD clipping, with success measured by reduction in MIA risk across layers.