Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach
Guilin Deng 1, Silong Chen 1, Yuchuan Luo 1, Yi Liu 2, Songlei Wang 3, Zhiping Cai 1, Lin Liu 1, Xiaohua Jia 2, Shaojing Fu 1
Published on arXiv
2604.21197
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Achieves near 100% membership inference accuracy on federated LLMs, outperforming existing MIA baselines by up to 75.75% absolute accuracy gain
ProjRes
Novel technique introduced
Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges of limited resources and privacy concerns. Despite data localization, shared gradients can still expose sensitive information through membership inference attacks (MIAs). However, FedLLMs' unique properties, i.e. massive parameter scales, rapid convergence, and sparse, non-orthogonal gradients, render existing MIAs ineffective. To address this gap, we propose ProjRes, the first projection residuals-based passive MIA tailored for FedLLMs. ProjRes leverages hidden embedding vectors as sample representations and analyzes their projection residuals on the gradient subspace to uncover the intrinsic link between gradients and inputs. It requires no shadow models, auxiliary classifiers, or historical updates, ensuring efficiency and robustness. Experiments on four benchmarks and four LLMs show that ProjRes achieves near 100% accuracy, outperforming prior methods by up to 75.75%, and remains effective even under strong differential privacy defenses. Our findings reveal a previously overlooked privacy vulnerability in FedLLMs and call for a re-examination of their security assumptions. Our code and data are available at $\href{https://anonymous.4open.science/r/Passive-MIA-5268}{link}$.
Key Contributions
- First projection residual-based passive MIA specifically designed for federated LLMs
- Analyzes hidden embedding projection residuals on gradient subspaces without requiring shadow models or auxiliary classifiers
- Achieves near 100% MIA accuracy, outperforming prior methods by up to 75.75%, even under differential privacy defenses
🛡️ Threat Analysis
Primary contribution is a membership inference attack (MIA) that determines whether specific samples were used in training FedLLMs by analyzing projection residuals of embeddings on gradient subspaces.