attack 2025

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

0 citations

Published on arXiv

2508.05167

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

PhysPatch significantly outperforms state-of-the-art adversarial patch methods in steering MLLM-based autonomous driving systems toward target-aligned perception and planning outputs across open-source, commercial, and reasoning-capable MLLMs while maintaining physical deployability.

PhysPatch

Novel technique introduced

Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks, particularly adversarial patch attacks, which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models and perform poorly when transferred to MLLM-based systems due to the latter's complex architectures and reasoning abilities. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms prior methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.

Key Contributions

SVD-based local alignment loss with patch-guided crop-resize strategy to improve adversarial patch transferability across diverse MLLMs and avoid gradient vanishing
Semantic-aware mask initialization leveraging MLLM reasoning to identify physically feasible and semantically meaningful patch placement regions in AD scenes
Adaptive potential field update algorithm for iterative patch shape refinement, jointly optimizing patch location, shape, and content

🛡️ Threat Analysis

Input Manipulation Attack

Proposes adversarial patch attacks — physical visual artifacts optimized to cause misclassification and manipulate inference-time outputs of MLLM-based systems. The core contributions (SVD-based local alignment loss, patch-guided crop-resize strategy) are novel adversarial example techniques targeting VLMs at inference time.

Details

Domains

visionmultimodal

Model Types

vlmmultimodal

Threat Tags

white_boxblack_boxphysicaldigitalinference_timetargeted

Applications

autonomous drivingmllm-based perception and planningvision-language reasoning systems

Read PDF arXiv

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Extended to Reality: Prompt Injection in 3D Environments

ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks

When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models

Physical Prompt Injection Attacks on Large Vision-Language Models