PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems
Qi Guo 1,2, Xiaojun Jia 3, Shanmin Pang 1, Simeng Qin 4, Lin Wang 5, Ju Jia 6, Yang Liu 3, Qing Guo 2
Published on arXiv
2508.05167
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
PhysPatch significantly outperforms state-of-the-art adversarial patch methods in steering MLLM-based autonomous driving systems toward target-aligned perception and planning outputs across open-source, commercial, and reasoning-capable MLLMs while maintaining physical deployability.
PhysPatch
Novel technique introduced
Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks, particularly adversarial patch attacks, which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models and perform poorly when transferred to MLLM-based systems due to the latter's complex architectures and reasoning abilities. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms prior methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.
Key Contributions
- SVD-based local alignment loss with patch-guided crop-resize strategy to improve adversarial patch transferability across diverse MLLMs and avoid gradient vanishing
- Semantic-aware mask initialization leveraging MLLM reasoning to identify physically feasible and semantically meaningful patch placement regions in AD scenes
- Adaptive potential field update algorithm for iterative patch shape refinement, jointly optimizing patch location, shape, and content
🛡️ Threat Analysis
Proposes adversarial patch attacks — physical visual artifacts optimized to cause misclassification and manipulate inference-time outputs of MLLM-based systems. The core contributions (SVD-based local alignment loss, patch-guided crop-resize strategy) are novel adversarial example techniques targeting VLMs at inference time.