attack 2025

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models

Xin Wang ^1,2, Jie Li ², Zejia Weng ¹, Yixu Wang ^1,2, Yifeng Gao ¹, Tianyu Pang ³, Chao Du ³, Yan Teng ², Yingchun Wang ², Zuxuan Wu ¹, Xingjun Ma ¹, Yu-Gang Jiang ¹

¹ Fudan University

² Shanghai AI Lab

³ Sea AI Lab

1 citations · 1 influential · 53 references · arXiv

Published on arXiv

2509.19870

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

FreezeVLA achieves 76.2% average cross-prompt attack success rate across three state-of-the-art VLA models, with a single adversarial image reliably inducing paralysis across diverse language prompts.

FreezeVLA

Novel technique introduced

Vision-Language-Action (VLA) models are driving rapid progress in robotics by enabling agents to interpret multimodal inputs and execute complex, long-horizon tasks. However, their safety and robustness against adversarial attacks remain largely underexplored. In this work, we identify and formalize a critical adversarial vulnerability in which adversarial images can "freeze" VLA models and cause them to ignore subsequent instructions. This threat effectively disconnects the robot's digital mind from its physical actions, potentially inducing inaction during critical interventions. To systematically study this vulnerability, we propose FreezeVLA, a novel attack framework that generates and evaluates action-freezing attacks via min-max bi-level optimization. Experiments on three state-of-the-art VLA models and four robotic benchmarks show that FreezeVLA attains an average attack success rate of 76.2%, significantly outperforming existing methods. Moreover, adversarial images generated by FreezeVLA exhibit strong transferability, with a single image reliably inducing paralysis across diverse language prompts. Our findings expose a critical safety risk in VLA models and highlight the urgent need for robust defense mechanisms.

Key Contributions

Formalizes the 'action-freezing' threat model in VLA systems, where adversarial images cause persistent robotic inaction by decoupling visual input processing from language-instruction following.
Proposes FreezeVLA, a min-max bi-level optimization framework: an inner maximization constructs adversarially robust 'hard prompts' and an outer minimization crafts adversarial images that defeat them, enabling strong cross-prompt transferability.
Demonstrates 76.2% average attack success rate across SpatialVLA, OpenVLA, and π0 on four LIBERO robotic manipulation benchmarks, significantly outperforming prior methods.

🛡️ Threat Analysis

Input Manipulation Attack

FreezeVLA generates gradient-based adversarial image perturbations that manipulate VLA model outputs at inference time — causing persistent inaction rather than misclassification, but still a direct adversarial example attack on the visual input stream of the model.

Details

Domains

visionmultimodalnlp

Model Types

vlmllm

Threat Tags

white_boxinference_timetargeteddigital

Datasets

LIBERO

Applications

robotic manipulationvision-language-action modelsautonomous agents

Read PDF arXiv DOI Code

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation

Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Visual Memory Injection Attacks for Multi-Turn Conversations

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

On the Adversarial Robustness of 3D Large Vision-Language Models