BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models

Prompt-based tuning has emerged as a lightweight alternative to full fine-tuning in large vision-language models, enabling efficient adaptation via learned contextual prompts. This paradigm has recently been extended to federated learning settings (e.g., PromptFL), where clients collaboratively train prompts under data privacy constraints. However, the security implications of prompt-based aggregation in federated multimodal learning remain largely unexplored, leaving a critical attack surface unaddressed. In this paper, we introduce \textbf{BadPromptFL}, the first backdoor attack targeting prompt-based federated learning in multimodal contrastive models. In BadPromptFL, compromised clients jointly optimize local backdoor triggers and prompt embeddings, injecting poisoned prompts into the global aggregation process. These prompts are then propagated to benign clients, enabling universal backdoor activation at inference without modifying model parameters. Leveraging the contextual learning behavior of CLIP-style architectures, BadPromptFL achieves high attack success rates (e.g., \(>90\%\)) with minimal visibility and limited client participation. Extensive experiments across multiple datasets and aggregation protocols validate the effectiveness, stealth, and generalizability of our attack, raising critical concerns about the robustness of prompt-based federated learning in real-world deployments.

Key Contributions

Introduces BadPromptFL, the first backdoor attack specifically targeting prompt-based federated learning in multimodal contrastive (CLIP-style) models.
Joint optimization of local backdoor triggers and prompt embeddings by compromised clients, enabling poisoned prompts to propagate to benign clients via global aggregation without modifying backbone parameters.
Demonstrates >90% attack success rate with minimal client participation across multiple datasets and aggregation protocols, exposing a critical security gap in prompt-based FL.

🛡️ Threat Analysis

Model Poisoning

BadPromptFL is a federated learning backdoor attack where malicious clients jointly optimize trigger patterns and prompt embeddings, injecting poisoned prompts into global aggregation; the backdoor activates only when a specific visual trigger is present at inference, with normal behavior otherwise — canonical ML10 trigger-based hidden behavior.

Details

Domains

multimodalfederated-learningvision

Model Types

vlmfederatedmultimodal

Threat Tags

training_timetargetedwhite_boxdigital

Applications

2025 5 cit.

Model Poisoning

68%