Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System
Zhenhua Zou , Sheng Guo , Qiuyang Zhan , Lepeng Zhao , Shuo Li , Qi Li , Ke Xu , Mingwei Xu , Zhuotao Liu
Published on arXiv
2602.10915
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Aura reduces high-risk Attack Success Rate from ~40% to 4.4% and improves low-risk Task Success Rate from ~75% to 94.3% compared to Doubao Mobile Assistant.
Aura (Agent Universal Runtime Architecture)
Novel technique introduced
The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "Screen-as-Interface" paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the "Screen-as-Interface" paradigm.
Key Contributions
- Systematic four-dimensional security analysis of state-of-the-art mobile LLM agents (Doubao), exposing fake app identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation as structural vulnerabilities of the Screen-as-Interface paradigm.
- Aura: an Agent Universal Runtime Architecture with Hub-and-Spoke topology replacing GUI scraping with a structured agent-native interaction model, enforced by a privileged Agent Kernel.
- Four defense pillars: cryptographic identity via a Global Agent Registry, a multilayer Semantic Firewall for prompt injection, taint-aware memory for cognitive integrity, and granular non-deniable action access control — evaluated on MobileSafetyBench.