defense 2026

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Zhenhua Zou , Sheng Guo , Qiuyang Zhan , Lepeng Zhao , Shuo Li , Qi Li , Ke Xu , Mingwei Xu , Zhuotao Liu

Tsinghua University

0 citations · 50 references · arXiv (Cornell University)

Published on arXiv

2602.10915

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Aura reduces high-risk Attack Success Rate from ~40% to 4.4% and improves low-risk Task Success Rate from ~75% to 94.3% compared to Doubao Mobile Assistant.

Aura (Agent Universal Runtime Architecture)

Novel technique introduced

The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "Screen-as-Interface" paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the "Screen-as-Interface" paradigm.

Key Contributions

Systematic four-dimensional security analysis of state-of-the-art mobile LLM agents (Doubao), exposing fake app identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation as structural vulnerabilities of the Screen-as-Interface paradigm.
Aura: an Agent Universal Runtime Architecture with Hub-and-Spoke topology replacing GUI scraping with a structured agent-native interaction model, enforced by a privileged Agent Kernel.
Four defense pillars: cryptographic identity via a Global Agent Registry, a multilayer Semantic Firewall for prompt injection, taint-aware memory for cognitive integrity, and granular non-deniable action access control — evaluated on MobileSafetyBench.

🛡️ Threat Analysis

Details

Domains

nlpmultimodal

Model Types

llmvlm

Threat Tags

inference_timeblack_box

Datasets

MobileSafetyBench

Applications

mobile ai agentsllm-based task automationmobile operating systems

Read PDF arXiv DOI

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

Enhancing Reliability in LLM-Integrated Robotic Systems: A Unified Approach to Security and Safety

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents