attack 2025

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Hui Lu ^1,2, Yi Yu ¹, Yiming Yang ¹, Chenyu Yi ¹, Qixing Zhang ², Bingquan Shen ¹, Alex Kot ¹, Xudong Jiang ¹

¹ Nanyang Technological University

² DSO National Laboratories

1 citations · 91 references · arXiv

Published on arXiv

2511.21192

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

UPA-RFAS consistently transfers adversarial patches across diverse VLA models, manipulation tasks, and viewpoints including physical robot executions, establishing a practical black-box patch attack baseline.

UPA-RFAS

Novel technique introduced

Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $\ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$\to$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.

Key Contributions

UPA-RFAS framework learning a single universal adversarial patch transferable across unknown VLA architectures, finetuned variants, and sim-to-real shifts
Robustness-augmented two-phase min-max optimization with inner-loop sample-wise perturbations and outer-loop universal patch training against hardened neighborhoods
Two VLA-specific losses — Patch Attention Dominance (hijacking text-to-vision attention) and Patch Semantic Misalignment (inducing image-text mismatch) — enabling label-free transferable attacks

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is adversarial patch generation — physical/digital inputs crafted to manipulate model outputs at inference time across diverse VLA architectures and tasks.

Details

Domains

visionmultimodalreinforcement-learning

Model Types

vlmtransformermultimodal

Threat Tags

black_boxgrey_boxinference_timetargeteddigitalphysical

Applications

robotic manipulationvision-language-action modelsautonomous robots

Read PDF arXiv DOI

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Extended to Reality: Prompt Injection in 3D Environments

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents

Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models

ANNIE: Be Careful of Your Robots

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Cross-Modal Content Optimization for Steering Web Agent Preferences

FeatureFool: Zero-Query Fooling of Video Models via Feature Map