attack 2025

Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models

Haochuan Xu 1, Yun Sing Koh 2, Shuhuai Huang 1, Zirun Zhou 1, Di Wang 3,4, Jun Sakuma 1, Jingfeng Zhang 1,2,4

6 citations · 31 references · arXiv

α

Published on arXiv

2510.13237

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

EDPA substantially increases task failure rate across cutting-edge VLA models on the LIBERO benchmark while the proposed adversarial fine-tuning defense effectively mitigates this degradation.

EDPA (Embedding Disruption Patch Attack)

Novel technique introduced


Vision-Language-Action (VLA) models have achieved revolutionary progress in robot learning, enabling robots to execute complex physical robot tasks from natural language instructions. Despite this progress, their adversarial robustness remains underexplored. In this work, we propose both adversarial patch attack and corresponding defense strategies for VLA models. We first introduce the Embedding Disruption Patch Attack (EDPA), a model-agnostic adversarial attack that generates patches directly placeable within the camera's view. In comparison to prior methods, EDPA can be readily applied to different VLA models without requiring prior knowledge of the model architecture, or the controlled robotic manipulator. EDPA constructs these patches by (i) disrupting the semantic alignment between visual and textual latent representations, and (ii) maximizing the discrepancy of latent representations between adversarial and corresponding clean visual inputs. Through the optimization of these objectives, EDPA distorts the VLA's interpretation of visual information, causing the model to repeatedly generate incorrect actions and ultimately result in failure to complete the given robotic task. To counter this, we propose an adversarial fine-tuning scheme for the visual encoder, in which the encoder is optimized to produce similar latent representations for both clean and adversarially perturbed visual inputs. Extensive evaluations on the widely recognized LIBERO robotic simulation benchmark demonstrate that EDPA substantially increases the task failure rate of cutting-edge VLA models, while our proposed defense effectively mitigates this degradation. The codebase is accessible via the homepage at https://edpa-attack.github.io/.


Key Contributions

  • EDPA: a model-agnostic adversarial patch attack on VLA models requiring only encoder access, using two objectives — disrupting visual-textual latent alignment and maximizing clean/adversarial representation discrepancy
  • Adversarial fine-tuning defense for the visual encoder that trains it to produce similar representations for clean and adversarially perturbed inputs
  • Evaluation on LIBERO benchmark demonstrating substantial task failure rate increases on state-of-the-art VLA models (OpenVLA, π0, Octo)

🛡️ Threat Analysis

Input Manipulation Attack

EDPA generates adversarial patches physically placed in a robot camera's view that disrupt visual-textual latent alignment and cause VLA models to execute incorrect actions at inference time — a physical adversarial patch attack. The defense (adversarial fine-tuning of the visual encoder) also directly targets adversarial example robustness.


Details

Domains
visionmultimodal
Model Types
vlmmultimodal
Threat Tags
grey_boxphysicalinference_timeuntargeted
Datasets
LIBERO
Applications
robotic manipulationrobot learningembodied ai