attack 2025

BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models

Maozhen Zhang 1, Mengnan Zhao 2, Wei Wang 3, Bo Wang 1

0 citations

α

Published on arXiv

2508.08040

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

BadPromptFL achieves >90% attack success rate with minimal client participation across multiple datasets and FL aggregation protocols while maintaining stealth on clean inputs.

BadPromptFL

Novel technique introduced


Prompt-based tuning has emerged as a lightweight alternative to full fine-tuning in large vision-language models, enabling efficient adaptation via learned contextual prompts. This paradigm has recently been extended to federated learning settings (e.g., PromptFL), where clients collaboratively train prompts under data privacy constraints. However, the security implications of prompt-based aggregation in federated multimodal learning remain largely unexplored, leaving a critical attack surface unaddressed. In this paper, we introduce \textbf{BadPromptFL}, the first backdoor attack targeting prompt-based federated learning in multimodal contrastive models. In BadPromptFL, compromised clients jointly optimize local backdoor triggers and prompt embeddings, injecting poisoned prompts into the global aggregation process. These prompts are then propagated to benign clients, enabling universal backdoor activation at inference without modifying model parameters. Leveraging the contextual learning behavior of CLIP-style architectures, BadPromptFL achieves high attack success rates (e.g., \(>90\%\)) with minimal visibility and limited client participation. Extensive experiments across multiple datasets and aggregation protocols validate the effectiveness, stealth, and generalizability of our attack, raising critical concerns about the robustness of prompt-based federated learning in real-world deployments.


Key Contributions

  • Introduces BadPromptFL, the first backdoor attack specifically targeting prompt-based federated learning in multimodal contrastive (CLIP-style) models.
  • Joint optimization of local backdoor triggers and prompt embeddings by compromised clients, enabling poisoned prompts to propagate to benign clients via global aggregation without modifying backbone parameters.
  • Demonstrates >90% attack success rate with minimal client participation across multiple datasets and aggregation protocols, exposing a critical security gap in prompt-based FL.

🛡️ Threat Analysis

Model Poisoning

BadPromptFL is a federated learning backdoor attack where malicious clients jointly optimize trigger patterns and prompt embeddings, injecting poisoned prompts into global aggregation; the backdoor activates only when a specific visual trigger is present at inference, with normal behavior otherwise — canonical ML10 trigger-based hidden behavior.


Details

Domains
multimodalfederated-learningvision
Model Types
vlmfederatedmultimodal
Threat Tags
training_timetargetedwhite_boxdigital
Applications
federated learningvision-language model fine-tuningimage classification