defense 2026

SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs

Mohamed Shaaban , Mohamed Elmahallawy

0 citations · 41 references · arXiv (Cornell University)

α

Published on arXiv

2602.13529

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

SecureGate reduces PII extraction recall by up to 17.07× and inference attack accuracy by up to 31.66× compared to baselines while maintaining task utility in heterogeneous federated settings.

SecureGate

Novel technique introduced


Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for privacy-sensitive applications. With the rapid adoption of large language models (LLMs), federated fine-tuning of generative LLMs has gained attention as a way to leverage distributed data while preserving confidentiality. However, this setting introduces fundamental challenges: (i) privacy leakage of personally identifiable information (PII) due to LLM memorization, and (ii) a persistent tension between global generalization and local utility under heterogeneous data. Existing defenses, such as data sanitization and differential privacy, reduce leakage but often degrade downstream performance. We propose SecureGate, a privacy-aware federated fine-tuning framework for LLMs that provides fine-grained privacy control without sacrificing utility. SecureGate employs a dual-adapter LoRA architecture: a secure adapter that learns sanitized, globally shareable representations, and a revealing adapter that captures sensitive, organization-specific knowledge. A token-controlled gating module selectively activates these adapters at inference time, enabling controlled information disclosure without retraining. Extensive experiments across multiple LLMs and real-world datasets show that SecureGate improves task utility while substantially reducing PII leakage, achieving up to a 31.66X reduction in inference attack accuracy and a 17.07X reduction in extraction recall for unauthorized requests. Additionally, it maintains 100% routing reliability to the correct adapter and incurs only minimal computational and communication overhead.


Key Contributions

  • Dual-adapter LoRA architecture (secure adapter for sanitized global representations, revealing adapter for sensitive organization-specific knowledge) that separates privacy-sensitive from shareable parameters
  • Token-controlled gating module that selectively activates adapters at inference time, enforcing access control over PII disclosure without retraining or model duplication
  • Achieves up to 31.66× reduction in inference attack accuracy and 17.07× reduction in PII extraction recall for unauthorized requests, with 100% routing reliability and minimal overhead

🛡️ Threat Analysis

Model Inversion Attack

The adversary threat model centers on recovering PII from a federated LLM's memorized training data — directly matching model inversion / training data extraction. The paper evaluates 'extraction recall' (how much PII an adversary can reconstruct from the model) and achieves a 17.07× reduction, confirming an active data-reconstruction adversary.


Details

Domains
nlpfederated-learning
Model Types
llmtransformerfederated
Threat Tags
training_timeinference_timewhite_boxblack_box
Applications
federated llm fine-tuningprivacy-sensitive nlphealthcare nlpenterprise llm deployment