BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Maozhen Zhang 1, Mengnan Zhao 2, Wei Wang 3, Bo Wang 1
Published on arXiv
2508.08040
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
BadPromptFL achieves >90% attack success rate with minimal client participation across multiple datasets and FL aggregation protocols while maintaining stealth on clean inputs.
BadPromptFL
Novel technique introduced
Prompt-based tuning has emerged as a lightweight alternative to full fine-tuning in large vision-language models, enabling efficient adaptation via learned contextual prompts. This paradigm has recently been extended to federated learning settings (e.g., PromptFL), where clients collaboratively train prompts under data privacy constraints. However, the security implications of prompt-based aggregation in federated multimodal learning remain largely unexplored, leaving a critical attack surface unaddressed. In this paper, we introduce \textbf{BadPromptFL}, the first backdoor attack targeting prompt-based federated learning in multimodal contrastive models. In BadPromptFL, compromised clients jointly optimize local backdoor triggers and prompt embeddings, injecting poisoned prompts into the global aggregation process. These prompts are then propagated to benign clients, enabling universal backdoor activation at inference without modifying model parameters. Leveraging the contextual learning behavior of CLIP-style architectures, BadPromptFL achieves high attack success rates (e.g., \(>90\%\)) with minimal visibility and limited client participation. Extensive experiments across multiple datasets and aggregation protocols validate the effectiveness, stealth, and generalizability of our attack, raising critical concerns about the robustness of prompt-based federated learning in real-world deployments.
Key Contributions
- Introduces BadPromptFL, the first backdoor attack specifically targeting prompt-based federated learning in multimodal contrastive (CLIP-style) models.
- Joint optimization of local backdoor triggers and prompt embeddings by compromised clients, enabling poisoned prompts to propagate to benign clients via global aggregation without modifying backbone parameters.
- Demonstrates >90% attack success rate with minimal client participation across multiple datasets and aggregation protocols, exposing a critical security gap in prompt-based FL.
🛡️ Threat Analysis
BadPromptFL is a federated learning backdoor attack where malicious clients jointly optimize trigger patterns and prompt embeddings, injecting poisoned prompts into global aggregation; the backdoor activates only when a specific visual trigger is present at inference, with normal behavior otherwise — canonical ML10 trigger-based hidden behavior.