benchmark 2025

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels

Chenghao Du 1, Quanfeng Huang , Tingxuan Tang 1, Zihao Wang 2, Adwait Nadkarni 1, Yue Xiao 1

0 citations · 93 references · arXiv

α

Published on arXiv

2510.27140

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Fraudulent ad prompt injections succeed with >80% reliability across mobile LLM agents, and even complex workflows requiring circumvention of OS security warnings (e.g., malware installation) are consistently completed by advanced multi-app agents.


Large Language Models (LLMs) have transformed software development, enabling AI-powered applications known as LLM-based agents that promise to automate tasks across diverse apps and workflows. Yet, the security implications of deploying such agents in adversarial mobile environments remain poorly understood. In this paper, we present the first systematic study of security risks in mobile LLM agents. We design and evaluate a suite of adversarial case studies, ranging from opportunistic manipulations such as pop-up advertisements to advanced, end-to-end workflows involving malware installation and cross-app data exfiltration. Our evaluation covers eight state-of-the-art mobile agents across three architectures, with over 2,000 adversarial and paired benign trials. The results reveal systemic vulnerabilities: low-barrier vectors such as fraudulent ads succeed with over 80% reliability, while even workflows requiring the circumvention of operating-system warnings, such as malware installation, are consistently completed by advanced multi-app agents. By mapping these attacks to the MITRE ATT&CK Mobile framework, we uncover novel privilege-escalation and persistence pathways unique to LLM-driven automation. Collectively, our findings provide the first end-to-end evidence that mobile LLM agents are exploitable in realistic adversarial settings, where untrusted third-party channels (e.g., ads, embedded webviews, cross-app notifications) are an inherent part of the mobile ecosystem.


Key Contributions

  • First systematic measurement study of security risks in mobile LLM agents, evaluating 8 agents across 3 architectures with over 2,000 adversarial and paired benign trials
  • Suite of adversarial case studies spanning low-barrier opportunistic attacks (fraudulent ads, >80% success rate) to advanced end-to-end workflows involving malware installation and cross-app data exfiltration
  • MITRE ATT&CK Mobile framework mapping revealing novel privilege-escalation and persistence pathways unique to LLM-driven mobile automation

🛡️ Threat Analysis


Details

Domains
nlpmultimodal
Model Types
llmvlmmultimodal
Threat Tags
black_boxinference_time
Applications
mobile llm agentsmobile app automationai personal assistants