SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs
Mohamed Shaaban , Mohamed Elmahallawy
Published on arXiv
2602.13529
Model Inversion Attack
OWASP ML Top 10 — ML03
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
SecureGate reduces PII extraction recall by up to 17.07× and inference attack accuracy by up to 31.66× compared to baselines while maintaining task utility in heterogeneous federated settings.
SecureGate
Novel technique introduced
Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for privacy-sensitive applications. With the rapid adoption of large language models (LLMs), federated fine-tuning of generative LLMs has gained attention as a way to leverage distributed data while preserving confidentiality. However, this setting introduces fundamental challenges: (i) privacy leakage of personally identifiable information (PII) due to LLM memorization, and (ii) a persistent tension between global generalization and local utility under heterogeneous data. Existing defenses, such as data sanitization and differential privacy, reduce leakage but often degrade downstream performance. We propose SecureGate, a privacy-aware federated fine-tuning framework for LLMs that provides fine-grained privacy control without sacrificing utility. SecureGate employs a dual-adapter LoRA architecture: a secure adapter that learns sanitized, globally shareable representations, and a revealing adapter that captures sensitive, organization-specific knowledge. A token-controlled gating module selectively activates these adapters at inference time, enabling controlled information disclosure without retraining. Extensive experiments across multiple LLMs and real-world datasets show that SecureGate improves task utility while substantially reducing PII leakage, achieving up to a 31.66X reduction in inference attack accuracy and a 17.07X reduction in extraction recall for unauthorized requests. Additionally, it maintains 100% routing reliability to the correct adapter and incurs only minimal computational and communication overhead.
Key Contributions
- Dual-adapter LoRA architecture (secure adapter for sanitized global representations, revealing adapter for sensitive organization-specific knowledge) that separates privacy-sensitive from shareable parameters
- Token-controlled gating module that selectively activates adapters at inference time, enforcing access control over PII disclosure without retraining or model duplication
- Achieves up to 31.66× reduction in inference attack accuracy and 17.07× reduction in PII extraction recall for unauthorized requests, with 100% routing reliability and minimal overhead
🛡️ Threat Analysis
The adversary threat model centers on recovering PII from a federated LLM's memorized training data — directly matching model inversion / training data extraction. The paper evaluates 'extraction recall' (how much PII an adversary can reconstruct from the model) and achieves a 17.07× reduction, confirming an active data-reconstruction adversary.