SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs

Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for privacy-sensitive applications. With the rapid adoption of large language models (LLMs), federated fine-tuning of generative LLMs has gained attention as a way to leverage distributed data while preserving confidentiality. However, this setting introduces fundamental challenges: (i) privacy leakage of personally identifiable information (PII) due to LLM memorization, and (ii) a persistent tension between global generalization and local utility under heterogeneous data. Existing defenses, such as data sanitization and differential privacy, reduce leakage but often degrade downstream performance. We propose SecureGate, a privacy-aware federated fine-tuning framework for LLMs that provides fine-grained privacy control without sacrificing utility. SecureGate employs a dual-adapter LoRA architecture: a secure adapter that learns sanitized, globally shareable representations, and a revealing adapter that captures sensitive, organization-specific knowledge. A token-controlled gating module selectively activates these adapters at inference time, enabling controlled information disclosure without retraining. Extensive experiments across multiple LLMs and real-world datasets show that SecureGate improves task utility while substantially reducing PII leakage, achieving up to a 31.66X reduction in inference attack accuracy and a 17.07X reduction in extraction recall for unauthorized requests. Additionally, it maintains 100% routing reliability to the correct adapter and incurs only minimal computational and communication overhead.

Key Contributions

Dual-adapter LoRA architecture (secure adapter for sanitized global representations, revealing adapter for sensitive organization-specific knowledge) that separates privacy-sensitive from shareable parameters
Token-controlled gating module that selectively activates adapters at inference time, enforcing access control over PII disclosure without retraining or model duplication
Achieves up to 31.66× reduction in inference attack accuracy and 17.07× reduction in PII extraction recall for unauthorized requests, with 100% routing reliability and minimal overhead

🛡️ Threat Analysis

Model Inversion Attack

The adversary threat model centers on recovering PII from a federated LLM's memorized training data — directly matching model inversion / training data extraction. The paper evaluates 'extraction recall' (how much PII an adversary can reconstruct from the model) and achieves a 17.07× reduction, confirming an active data-reconstruction adversary.