Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning
Quan Minh Nguyen 1, Min-Seon Kim 2, Hoang M. Ngo 1, Trong Nghia Hoang 3, Hyuk-Yoon Kwon 4, My T. Thai 1
Published on arXiv
2601.06641
Membership Inference Attack
OWASP ML Top 10 — ML04
Transfer Learning Attack
OWASP ML Top 10 — ML07
Key Finding
PromptMIA achieves consistently high membership inference advantage across diverse benchmarks while existing gradient- and output-based MIA defenses fail to mitigate it in the federated prompt-tuning setting.
PromptMIA
Novel technique introduced
Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.
Key Contributions
- PromptMIA: a novel membership inference attack for federated prompt-tuning in which a malicious server inserts adversarially crafted soft prompts and monitors their updates to infer data membership
- Formal security game definition and theoretical lower bound on attack advantage explaining consistently high empirical performance
- Empirical analysis showing existing MIA defenses (originally designed for gradient- or output-based attacks) are largely ineffective against PromptMIA
🛡️ Threat Analysis
PromptMIA is a membership inference attack — the malicious server determines whether a specific data point belongs to a client's private training dataset, which is the canonical ML04 threat. The paper formalizes this as a security game and provides empirical and theoretical attack advantage results.
The attack specifically exploits the federated prompt-tuning (adapter tuning) paradigm as the attack surface — the malicious server inserts adversarially crafted soft prompts and monitors their gradient updates during collaborative fine-tuning of pre-trained models. The attack would not exist without the prompt-tuning / adapter-tuning mechanism, making this a direct exploitation of the transfer learning / fine-tuning process.