FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
Xin Wang 1,2, Jie Li 2, Zejia Weng 1, Yixu Wang 1,2, Yifeng Gao 1, Tianyu Pang 3, Chao Du 3, Yan Teng 2, Yingchun Wang 2, Zuxuan Wu 1, Xingjun Ma 1, Yu-Gang Jiang 1
Published on arXiv
2509.19870
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
FreezeVLA achieves 76.2% average cross-prompt attack success rate across three state-of-the-art VLA models, with a single adversarial image reliably inducing paralysis across diverse language prompts.
FreezeVLA
Novel technique introduced
Vision-Language-Action (VLA) models are driving rapid progress in robotics by enabling agents to interpret multimodal inputs and execute complex, long-horizon tasks. However, their safety and robustness against adversarial attacks remain largely underexplored. In this work, we identify and formalize a critical adversarial vulnerability in which adversarial images can "freeze" VLA models and cause them to ignore subsequent instructions. This threat effectively disconnects the robot's digital mind from its physical actions, potentially inducing inaction during critical interventions. To systematically study this vulnerability, we propose FreezeVLA, a novel attack framework that generates and evaluates action-freezing attacks via min-max bi-level optimization. Experiments on three state-of-the-art VLA models and four robotic benchmarks show that FreezeVLA attains an average attack success rate of 76.2%, significantly outperforming existing methods. Moreover, adversarial images generated by FreezeVLA exhibit strong transferability, with a single image reliably inducing paralysis across diverse language prompts. Our findings expose a critical safety risk in VLA models and highlight the urgent need for robust defense mechanisms.
Key Contributions
- Formalizes the 'action-freezing' threat model in VLA systems, where adversarial images cause persistent robotic inaction by decoupling visual input processing from language-instruction following.
- Proposes FreezeVLA, a min-max bi-level optimization framework: an inner maximization constructs adversarially robust 'hard prompts' and an outer minimization crafts adversarial images that defeat them, enabling strong cross-prompt transferability.
- Demonstrates 76.2% average attack success rate across SpatialVLA, OpenVLA, and π0 on four LIBERO robotic manipulation benchmarks, significantly outperforming prior methods.
🛡️ Threat Analysis
FreezeVLA generates gradient-based adversarial image perturbations that manipulate VLA model outputs at inference time — causing persistent inaction rather than misclassification, but still a direct adversarial example attack on the visual input stream of the model.