attack 2026

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng 1, Silong Chen 1, Yuchuan Luo 1, Yi Liu 2, Songlei Wang 3, Zhiping Cai 1, Lin Liu 1, Xiaohua Jia 2, Shaojing Fu 1

0 citations

α

Published on arXiv

2604.21197

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Achieves near 100% membership inference accuracy on federated LLMs, outperforming existing MIA baselines by up to 75.75% absolute accuracy gain

ProjRes

Novel technique introduced


Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges of limited resources and privacy concerns. Despite data localization, shared gradients can still expose sensitive information through membership inference attacks (MIAs). However, FedLLMs' unique properties, i.e. massive parameter scales, rapid convergence, and sparse, non-orthogonal gradients, render existing MIAs ineffective. To address this gap, we propose ProjRes, the first projection residuals-based passive MIA tailored for FedLLMs. ProjRes leverages hidden embedding vectors as sample representations and analyzes their projection residuals on the gradient subspace to uncover the intrinsic link between gradients and inputs. It requires no shadow models, auxiliary classifiers, or historical updates, ensuring efficiency and robustness. Experiments on four benchmarks and four LLMs show that ProjRes achieves near 100% accuracy, outperforming prior methods by up to 75.75%, and remains effective even under strong differential privacy defenses. Our findings reveal a previously overlooked privacy vulnerability in FedLLMs and call for a re-examination of their security assumptions. Our code and data are available at $\href{https://anonymous.4open.science/r/Passive-MIA-5268}{link}$.


Key Contributions

  • First projection residual-based passive MIA specifically designed for federated LLMs
  • Analyzes hidden embedding projection residuals on gradient subspaces without requiring shadow models or auxiliary classifiers
  • Achieves near 100% MIA accuracy, outperforming prior methods by up to 75.75%, even under differential privacy defenses

🛡️ Threat Analysis

Membership Inference Attack

Primary contribution is a membership inference attack (MIA) that determines whether specific samples were used in training FedLLMs by analyzing projection residuals of embeddings on gradient subspaces.


Details

Domains
nlpfederated-learning
Model Types
llmtransformerfederated
Threat Tags
training_timewhite_box
Datasets
SST-2
Applications
federated learningllm fine-tuningprivacy-sensitive domains (healthcare, finance)