Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers
Jiancheng Wang 1,2, Lidan Liang 3, Yong Wang 4, Zengzhen Su 2, Haifeng Xia 3, Yuanting Yan 1,2, Wei Wang 3
Published on arXiv
2604.04630
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Achieves 90% Attack Success Rate with 10% poisoning ratio and 0% False Positive Rate while improving BLEU-1 scores on clean tasks
GLA
Novel technique introduced
Visual language model (VLM) is rapidly being integrated into safety-critical systems such as autonomous driving, making it an important attack surface for potential backdoor attacks. Existing backdoor attacks mainly rely on unimodal, explicit, and easily detectable triggers, making it difficult to construct both covert and stable attack channels in autonomous driving scenarios. GLA introduces two naturalistic triggers: graffiti-based visual patterns generated via stable diffusion inpainting, which seamlessly blend into urban scenes, and cross-language text triggers, which introduce distributional shifts while maintaining semantic consistency to build robust language-side trigger signals. Experiments on DriveVLM show that GLA requires only a 10\% poisoning ratio to achieve a 90\% Attack Success Rate (ASR) and a 0\% False Positive Rate (FPR). More insidiously, the backdoor does not weaken the model on clean tasks, but instead improves metrics such as BLEU-1, making it difficult for traditional performance-degradation-based detection methods to identify the attack. This study reveals underestimated security threats in self-driving VLMs and provides a new attack paradigm for backdoor evaluation in safety-critical multimodal systems.
Key Contributions
- First multimodal backdoor attack for autonomous driving VLMs combining graffiti-based visual triggers (via stable diffusion inpainting) with cross-lingual text triggers
- Achieves 90% attack success rate with only 10% poisoning ratio and 0% false positive rate on DriveVLM
- Backdoor improves clean-task metrics (BLEU-1), making it resistant to performance-degradation-based detection methods
🛡️ Threat Analysis
Paper proposes a backdoor attack (GLA) that implants hidden malicious mappings in VLMs during training, activating only when specific graffiti visual patterns and cross-lingual text triggers are present together. The backdoor behavior is targeted and trigger-based, which is the core definition of ML10.