attack 2026

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

0 citations · 49 references · arXiv

Published on arXiv

2601.06641

Membership Inference Attack

OWASP ML Top 10 — ML04

Transfer Learning Attack

OWASP ML Top 10 — ML07

Key Finding

PromptMIA achieves consistently high membership inference advantage across diverse benchmarks while existing gradient- and output-based MIA defenses fail to mitigate it in the federated prompt-tuning setting.

PromptMIA

Novel technique introduced

Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.

Key Contributions

PromptMIA: a novel membership inference attack for federated prompt-tuning in which a malicious server inserts adversarially crafted soft prompts and monitors their updates to infer data membership
Formal security game definition and theoretical lower bound on attack advantage explaining consistently high empirical performance
Empirical analysis showing existing MIA defenses (originally designed for gradient- or output-based attacks) are largely ineffective against PromptMIA

🛡️ Threat Analysis

Membership Inference Attack

PromptMIA is a membership inference attack — the malicious server determines whether a specific data point belongs to a client's private training dataset, which is the canonical ML04 threat. The paper formalizes this as a security game and provides empirical and theoretical attack advantage results.

Transfer Learning Attack

The attack specifically exploits the federated prompt-tuning (adapter tuning) paradigm as the attack surface — the malicious server inserts adversarially crafted soft prompts and monitors their gradient updates during collaborative fine-tuning of pre-trained models. The attack would not exist without the prompt-tuning / adapter-tuning mechanism, making this a direct exploitation of the transfer learning / fine-tuning process.

Details

Domains

nlpfederated-learning

Model Types

transformerfederated

Threat Tags

white_boxtraining_timetargeted

Applications

federated learningfederated prompt-tuningpre-trained language model adaptation

Read PDF arXiv DOI

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis

Towards Privacy-Preserving Mental Health Support with Large Language Models

Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models