attack 2026

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

Quan Minh Nguyen 1, Min-Seon Kim 2, Hoang M. Ngo 1, Trong Nghia Hoang 3, Hyuk-Yoon Kwon 4, My T. Thai 1

0 citations · 49 references · arXiv

α

Published on arXiv

2601.06641

Membership Inference Attack

OWASP ML Top 10 — ML04

Transfer Learning Attack

OWASP ML Top 10 — ML07

Key Finding

PromptMIA achieves consistently high membership inference advantage across diverse benchmarks while existing gradient- and output-based MIA defenses fail to mitigate it in the federated prompt-tuning setting.

PromptMIA

Novel technique introduced


Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.


Key Contributions

  • PromptMIA: a novel membership inference attack for federated prompt-tuning in which a malicious server inserts adversarially crafted soft prompts and monitors their updates to infer data membership
  • Formal security game definition and theoretical lower bound on attack advantage explaining consistently high empirical performance
  • Empirical analysis showing existing MIA defenses (originally designed for gradient- or output-based attacks) are largely ineffective against PromptMIA

🛡️ Threat Analysis

Membership Inference Attack

PromptMIA is a membership inference attack — the malicious server determines whether a specific data point belongs to a client's private training dataset, which is the canonical ML04 threat. The paper formalizes this as a security game and provides empirical and theoretical attack advantage results.

Transfer Learning Attack

The attack specifically exploits the federated prompt-tuning (adapter tuning) paradigm as the attack surface — the malicious server inserts adversarially crafted soft prompts and monitors their gradient updates during collaborative fine-tuning of pre-trained models. The attack would not exist without the prompt-tuning / adapter-tuning mechanism, making this a direct exploitation of the transfer learning / fine-tuning process.


Details

Domains
nlpfederated-learning
Model Types
transformerfederated
Threat Tags
white_boxtraining_timetargeted
Applications
federated learningfederated prompt-tuningpre-trained language model adaptation