defense 2026

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Jiayang Meng 1, Tao Huang 2, Chen Hou 2, Guolong Zheng 2, Hong Chen 1

0 citations

α

Published on arXiv

2602.22611

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Under the same privacy budget, LM-DP-SGD reduces peak intermediate-representation-level MIA risk compared to uniform DP-SGD while preserving downstream task utility.

LM-DP-SGD

Novel technique introduced


In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.


Key Contributions

  • LM-DP-SGD: a DP-SGD variant that estimates per-layer MIA risk via shadow-model adversaries and reweights gradient clipping proportionally to layer vulnerability under a fixed noise budget
  • Theoretical guarantees on both privacy (differential privacy) and convergence for the proposed layer-wise reweighted clipping scheme
  • Empirical demonstration that LM-DP-SGD achieves a superior privacy-utility trade-off compared to uniform DP-SGD by reducing peak IR-level MIA risk while preserving model utility

🛡️ Threat Analysis

Membership Inference Attack

The paper's primary contribution is a defense against Membership Inference Attacks (MIAs) on intermediate representations in EaaI settings; it trains layer-specific MIA adversaries to estimate per-layer vulnerability and uses those estimates to guide adaptive DP-SGD clipping, with success measured by reduction in MIA risk across layers.


Details

Domains
nlp
Model Types
transformertraditional_ml
Threat Tags
black_boxinference_timetraining_time
Applications
embedding-as-a-serviceknowledge distillationmodular ml systems