defense arXiv Oct 30, 2025 · Oct 2025
Yifeng Cai, Ziming Wang, Zhaomeng Deng et al. · Peking University · Huazhong University of Science and Technology +1 more
Defends mobile AI agents against indirect instruction injection using dynamic, task-scoped minimal permissions via runtime access control
Prompt Injection Excessive Agency nlpmultimodal
AI agents capable of GUI understanding and Model Context Protocol are increasingly deployed to automate mobile tasks. However, their reliance on over-privileged, static permissions creates a critical vulnerability: instruction injection. Malicious instructions, embedded in otherwise benign content like emails, can hijack the agent to perform unauthorized actions. We present AgentSentry, a lightweight runtime task-centric access control framework that enforces dynamic, task-scoped permissions. Instead of granting broad, persistent permissions, AgentSentry dynamically generates and enforces minimal, temporary policies aligned with the user's specific task (e.g., register for an app), revoking them upon completion. We demonstrate that AgentSentry successfully prevents an instruction injection attack, where an agent is tricked into forwarding private emails, while allowing the legitimate task to complete. Our approach highlights the urgent need for intent-aligned security models to safely govern the next generation of autonomous agents.
llm vlm multimodal Peking University · Huazhong University of Science and Technology · University of Illinois Urbana-Champaign
attack arXiv Jan 22, 2026 · 10w ago
Mengyu Yao, Ziqi Zhang, Ning Luo et al. · Peking University · University of Illinois Urbana-Champaign
Attacks RAG systems to steal private knowledge bases via knowledge-graph-guided adaptive queries, achieving 84.4% corpus coverage in 1,000 queries
Sensitive Information Disclosure nlp
Stealing attacks pose a persistent threat to the intellectual property of deployed machine-learning systems. Retrieval-augmented generation (RAG) intensifies this risk by extending the attack surface beyond model weights to knowledge base that often contains IP-bearing assets such as proprietary runbooks, curated domain collections, or licensed documents. Recent work shows that multi-turn questioning can gradually steal corpus content from RAG systems, yet existing attacks are largely heuristic and often plateau early. We address this gap by formulating RAG knowledge-base stealing as an adaptive stochastic coverage problem (ASCP), where each query is a stochastic action and the goal is to maximize the conditional expected marginal gain (CMG) in corpus coverage under a query budget. Bridging ASCP to real-world black-box RAG knowledge-base stealing raises three challenges: CMG is unobservable, the natural-language action space is intractably large, and feasibility constraints require stealthy queries that remain effective under diverse architectures. We introduce RAGCrawler, a knowledge graph-guided attacker that maintains a global attacker-side state to estimate coverage gains, schedule high-value semantic anchors, and generate non-redundant natural queries. Across four corpora and four generators with BGE retriever, RAGCrawler achieves 66.8% average coverage (up to 84.4%) within 1,000 queries, improving coverage by 44.90% relative to the strongest baseline. It also reduces the queries needed to reach 70% coverage by at least 4.03x on average and enables surrogate reconstruction with answer similarity up to 0.699. Our attack is also scalable to retriever switching and newer RAG techniques like query rewriting and multi-query retrieval. These results highlight urgent needs to protect RAG knowledge assets.
llm transformer Peking University · University of Illinois Urbana-Champaign