ANNIE: Be Careful of Your Robots
Yiyang Huang 1, Zixuan Wang 1, Zishen Wan 2,1, Yapeng Tian 3, Haobo Xu 1, Yinhe Han 1, Yiming Gan 1
Published on arXiv
2509.03383
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
ANNIE-Attack achieves attack success rates exceeding 50% across all safety violation categories (critical, dangerous, risky) on representative VLA models, with physical robot experiments confirming real-world impact.
ANNIE-Attack
Novel technique introduced
The integration of vision-language-action (VLA) models into embodied AI (EAI) robots is rapidly advancing their ability to perform complex, long-horizon tasks in humancentric environments. However, EAI systems introduce critical security risks: a compromised VLA model can directly translate adversarial perturbations on sensory input into unsafe physical actions. Traditional safety definitions and methodologies from the machine learning community are no longer sufficient. EAI systems raise new questions, such as what constitutes safety, how to measure it, and how to design effective attack and defense mechanisms in physically grounded, interactive settings. In this work, we present the first systematic study of adversarial safety attacks on embodied AI systems, grounded in ISO standards for human-robot interactions. We (1) formalize a principled taxonomy of safety violations (critical, dangerous, risky) based on physical constraints such as separation distance, velocity, and collision boundaries; (2) introduce ANNIEBench, a benchmark of nine safety-critical scenarios with 2,400 video-action sequences for evaluating embodied safety; and (3) ANNIE-Attack, a task-aware adversarial framework with an attack leader model that decomposes long-horizon goals into frame-level perturbations. Our evaluation across representative EAI models shows attack success rates exceeding 50% across all safety categories. We further demonstrate sparse and adaptive attack strategies and validate the real-world impact through physical robot experiments. These results expose a previously underexplored but highly consequential attack surface in embodied AI systems, highlighting the urgent need for security-driven defenses in the physical AI era. Code is available at https://github.com/RLCLab/Annie.
Key Contributions
- Principled taxonomy of EAI safety violations (critical, dangerous, risky) grounded in ISO human-robot interaction standards using physical constraints such as separation distance, velocity, and collision boundaries.
- ANNIEBench: a benchmark of 9 safety-critical scenarios with 2,400 video-action sequences for evaluating adversarial safety in embodied AI systems.
- ANNIE-Attack: a task-aware adversarial framework with an attack leader model that decomposes long-horizon robot goals into frame-level visual perturbations, achieving >50% attack success across all safety categories including sparse and adaptive variants.
🛡️ Threat Analysis
ANNIE-Attack crafts gradient-based frame-level adversarial perturbations on visual sensory inputs to VLA models at inference time, causing the model to output unsafe physical actions — a direct input manipulation attack causing misclassification of intended behavior.