Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges of limited resources and privacy concerns. Despite data localization, shared gradients can still expose sensitive information through membership inference attacks (MIAs). However, FedLLMs' unique properties, i.e. massive parameter scales, rapid convergence, and sparse, non-orthogonal gradients, render existing MIAs ineffective. To address this gap, we propose ProjRes, the first projection residuals-based passive MIA tailored for FedLLMs. ProjRes leverages hidden embedding vectors as sample representations and analyzes their projection residuals on the gradient subspace to uncover the intrinsic link between gradients and inputs. It requires no shadow models, auxiliary classifiers, or historical updates, ensuring efficiency and robustness. Experiments on four benchmarks and four LLMs show that ProjRes achieves near 100% accuracy, outperforming prior methods by up to 75.75%, and remains effective even under strong differential privacy defenses. Our findings reveal a previously overlooked privacy vulnerability in FedLLMs and call for a re-examination of their security assumptions. Our code and data are available at $\href{https://anonymous.4open.science/r/Passive-MIA-5268}{link}$.

Key Contributions

First projection residual-based passive MIA specifically designed for federated LLMs
Analyzes hidden embedding projection residuals on gradient subspaces without requiring shadow models or auxiliary classifiers
Achieves near 100% MIA accuracy, outperforming prior methods by up to 75.75%, even under differential privacy defenses

🛡️ Threat Analysis

Membership Inference Attack

Primary contribution is a membership inference attack (MIA) that determines whether specific samples were used in training FedLLMs by analyzing projection residuals of embeddings on gradient subspaces.

Details

Domains

nlpfederated-learning

Model Types

llmtransformerfederated

Threat Tags

training_timewhite_box

Datasets

SST-2

Applications

2026 0 cit.

Membership Inference Attack

60%

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

Towards Privacy-Preserving Mental Health Support with Large Language Models

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis

Learning the Signature of Memorization in Autoregressive Language Models