Comparative Analysis of Patch Attack on VLM-Based Autonomous Driving Architectures
David Fernandez , Pedram MohajerAnsari , Amir Salarpour , Long Cheng , Abolfazl Razi , Mert D. Pesé
Published on arXiv
2603.08897
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
All three evaluated VLM architectures exhibit severe vulnerabilities to physical adversarial patches, with sustained multi-frame driving failures and critical object detection degradation revealing distinct per-architecture weakness patterns
NES-based adversarial patches with semantic homogenization
Novel technique introduced
Vision-language models are emerging for autonomous driving, yet their robustness to physical adversarial attacks remains unexplored. This paper presents a systematic framework for comparative adversarial evaluation across three VLM architectures: Dolphins, OmniDrive (Omni-L), and LeapVAD. Using black-box optimization with semantic homogenization for fair comparison, we evaluate physically realizable patch attacks in CARLA simulation. Results reveal severe vulnerabilities across all architectures, sustained multi-frame failures, and critical object detection degradation. Our analysis exposes distinct architectural vulnerability patterns, demonstrating that current VLM designs inadequately address adversarial threats in safety-critical autonomous driving applications.
Key Contributions
- Systematic cross-architecture adversarial evaluation framework for VLM-based autonomous driving systems with rigorous model selection criteria
- Semantic homogenization layer that projects heterogeneous VLM outputs into a unified embedding space for architecture-agnostic attack comparison
- Empirical characterization of distinct architectural vulnerability patterns across Dolphins, OmniDrive (Omni-L), and LeapVAD under physically realizable patch attacks in CARLA simulation
🛡️ Threat Analysis
Physically realizable adversarial patches crafted via black-box NES optimization to cause misclassification and object detection degradation at inference time in VLM vision encoders.