ANNIE: Be Careful of Your Robots

The integration of vision-language-action (VLA) models into embodied AI (EAI) robots is rapidly advancing their ability to perform complex, long-horizon tasks in humancentric environments. However, EAI systems introduce critical security risks: a compromised VLA model can directly translate adversarial perturbations on sensory input into unsafe physical actions. Traditional safety definitions and methodologies from the machine learning community are no longer sufficient. EAI systems raise new questions, such as what constitutes safety, how to measure it, and how to design effective attack and defense mechanisms in physically grounded, interactive settings. In this work, we present the first systematic study of adversarial safety attacks on embodied AI systems, grounded in ISO standards for human-robot interactions. We (1) formalize a principled taxonomy of safety violations (critical, dangerous, risky) based on physical constraints such as separation distance, velocity, and collision boundaries; (2) introduce ANNIEBench, a benchmark of nine safety-critical scenarios with 2,400 video-action sequences for evaluating embodied safety; and (3) ANNIE-Attack, a task-aware adversarial framework with an attack leader model that decomposes long-horizon goals into frame-level perturbations. Our evaluation across representative EAI models shows attack success rates exceeding 50% across all safety categories. We further demonstrate sparse and adaptive attack strategies and validate the real-world impact through physical robot experiments. These results expose a previously underexplored but highly consequential attack surface in embodied AI systems, highlighting the urgent need for security-driven defenses in the physical AI era. Code is available at https://github.com/RLCLab/Annie.

Key Contributions

Principled taxonomy of EAI safety violations (critical, dangerous, risky) grounded in ISO human-robot interaction standards using physical constraints such as separation distance, velocity, and collision boundaries.
ANNIEBench: a benchmark of 9 safety-critical scenarios with 2,400 video-action sequences for evaluating adversarial safety in embodied AI systems.
ANNIE-Attack: a task-aware adversarial framework with an attack leader model that decomposes long-horizon robot goals into frame-level visual perturbations, achieving >50% attack success across all safety categories including sparse and adaptive variants.

🛡️ Threat Analysis

Input Manipulation Attack

ANNIE-Attack crafts gradient-based frame-level adversarial perturbations on visual sensory inputs to VLA models at inference time, causing the model to output unsafe physical actions — a direct input manipulation attack causing misclassification of intended behavior.

Details

Domains

visionmultimodalreinforcement-learningnlp

Model Types

vlmmultimodaltransformerllm

Threat Tags

white_boxinference_timetargeteddigitalphysical

Datasets

ANNIEBench

Applications

2025 0 cit.

Input Manipulation Attack

80%