Reconstructing Training Data from Adapter-based Federated Large Language Models
Silong Chen 1, Yuchuan Luo 1, Guilin Deng 1, Yi Liu 2, Min Xu 1, Shaojing Fu 1, Xiaohua Jia 2
Published on arXiv
2601.17533
Model Inversion Attack
OWASP ML Top 10 — ML03
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
UTR achieves ROUGE-1/2 > 99 reconstruction accuracy on adapter-based FedLLMs including Qwen2.5-7B, even at large batch sizes where existing gradient inversion attacks fail completely.
UTR (Unordered-word-bag-based Text Reconstruction)
Novel technique introduced
Adapter-based Federated Large Language Models (FedLLMs) are widely adopted to reduce the computational, storage, and communication overhead of full-parameter fine-tuning for web-scale applications while preserving user privacy. By freezing the backbone and training only compact low-rank adapters, these methods appear to limit gradient leakage and thwart existing Gradient Inversion Attacks (GIAs). Contrary to this assumption, we show that low-rank adapters create new, exploitable leakage channels. We propose the Unordered-word-bag-based Text Reconstruction (UTR) attack, a novel GIA tailored to the unique structure of adapter-based FedLLMs. UTR overcomes three core challenges: low-dimensional gradients, frozen backbones, and combinatorially large reconstruction spaces by: (i) inferring token presence from attention patterns in frozen layers, (ii) performing sentence-level inversion within the low-rank subspace of adapter gradients, and (iii) enforcing semantic coherence through constrained greedy decoding guided by language priors. Extensive experiments across diverse models (GPT2-Large, BERT, Qwen2.5-7B) and datasets (CoLA, SST-2, Rotten Tomatoes) demonstrate that UTR achieves near-perfect reconstruction accuracy (ROUGE-1/2 > 99), even with large batch size settings where prior GIAs fail completely. Our results reveal a fundamental tension between parameter efficiency and privacy in FedLLMs, challenging the prevailing belief that lightweight adaptation inherently enhances security. Our code and data are available at https://github.com/shwksnshwowk-wq/GIA.
Key Contributions
- UTR attack that infers token presence from frozen-layer attention patterns and performs sentence-level inversion within the low-rank subspace of LoRA adapter gradients
- Constrained greedy decoding guided by language priors to enforce semantic coherence during reconstruction, overcoming the combinatorially large search space
- Near-perfect reconstruction (ROUGE-1/2 > 99) on GPT2-Large, BERT, and Qwen2.5-7B at large batch sizes where all prior GIAs fail, revealing a fundamental tension between parameter efficiency and privacy in FedLLMs
🛡️ Threat Analysis
Proposes UTR, a gradient inversion attack that reconstructs private training data from gradients shared by clients in federated learning — the canonical ML03 threat. The adversary (aggregation server) exploits low-rank adapter gradient structure to recover client training text with near-perfect fidelity.