Latest papers

111 papers
defense arXiv Apr 2, 2026 · 4d ago

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang, Zhijia Zhao, Bihuan Chen et al. · Fudan University

Constructs dataset of 114 malicious MCP servers exploiting LLM tool-calling and proposes behavioral deviation detector achieving 94.6% F1

Insecure Plugin Design nlp
PDF
benchmark arXiv Mar 30, 2026 · 7d ago

Evaluating Privilege Usage of Agents on Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian et al. · East China Normal University · Xinjiang University +1 more

Benchmark evaluating LLM agents' privilege control under prompt injection attacks using real-world tools, finding 84.80% attack success

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
attack arXiv Mar 25, 2026 · 12d ago

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Yulin Shen, Xudong Pan, Geng Hong et al. · Fudan University · Shanghai Innovation Institute

Black-box tree-search attack generating stealthy injection payloads that hijack MCP-enabled LLM agents through manipulated tool responses

Prompt Injection Insecure Plugin Design nlp
PDF
survey arXiv Mar 24, 2026 · 13d ago

SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy

Ali Dehghantanha, Sajad Homayoun · University of Guelph · Aalborg University

Surveys attack surface of agentic LLM systems: prompt injection, RAG poisoning, tool exploits, and multi-agent threats with defense taxonomy

Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF
benchmark arXiv Mar 23, 2026 · 14d ago

Are AI-assisted Development Tools Immune to Prompt Injection?

Charoes Huang, Xin Huang, Amin Milani Fard · New York Institute of Technology

Evaluates prompt injection and tool-poisoning vulnerabilities across seven MCP-based AI coding assistants, revealing major security gaps

Prompt Injection Insecure Plugin Design nlp
PDF
benchmark arXiv Mar 23, 2026 · 14d ago

Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning

Charoes Huang, Xin Huang, Ngoc Phu Tran et al. · New York Institute of Technology

Threat models MCP client vulnerabilities using STRIDE/DREAD frameworks, revealing tool poisoning as critical attack vector across seven major clients

Prompt Injection Insecure Plugin Design nlp
PDF
defense arXiv Mar 22, 2026 · 15d ago

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Octavian Untila · Aisophical SRL

Autonomous AI system independently discovers SMT-based formal verification for AI safety across six domains with 100% accuracy

Output Integrity Attack Insecure Plugin Design Excessive Agency nlpmultimodal
PDF
attack CoDAIM workshop Mar 21, 2026 · 16d ago

ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

Yusheng Zheng, Yiwei Yang, Wei Zhang et al. · UC Santa Cruz · University of Connecticut

LLM agent checkpoint-restore creates replay vulnerabilities enabling duplicate payments and credential reuse through non-deterministic request regeneration

Insecure Plugin Design Excessive Agency nlp
PDF
attack arXiv Mar 20, 2026 · 17d ago

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu, Zhuoyan Chen, Tu Lan et al. · Shanghai Jiao Tong University

Supply chain attack embedding malicious operational narratives in autonomous coding agent bootstrap guidance, achieving up to 64% success rate

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Mar 14, 2026 · 23d ago

ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering

Hussein Jawad, Nicolas J-B Brunel · Capgemini Invent · University Paris-Saclay +1 more

Denial-of-service attack on LLM agents that injects adversarial tools to dominate retrieval and hide all legitimate tools

Input Manipulation Attack Insecure Plugin Design Model Denial of Service nlp
PDF Code
survey arXiv Mar 13, 2026 · 24d ago

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu et al. · Beihang University · Zhongguancun Laboratory +1 more

Security analysis of OpenClaw autonomous agents revealing prompt injection RCE, tool chain attacks, and proposing FASA defense architecture

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF Code
survey arXiv Mar 12, 2026 · 25d ago

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu et al. · Ant Group · Tsinghua University

Proposes five-layer lifecycle security framework for autonomous LLM agents, analyzing prompt injection, supply chain, memory poisoning, and intent drift threats

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
tool arXiv Mar 12, 2026 · 25d ago

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents

Frank Li · UNSW Sydney

Deployable runtime security layer for LLM agent gateways defending against prompt injection and unsafe tool execution across ten lifecycle hooks

Prompt Injection Insecure Plugin Design nlp
PDF
survey arXiv Mar 11, 2026 · 26d ago

The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey

Juhee Kim, Xiaoyuan Liu, Zhun Wang et al. · University of California · Seoul National University +1 more

Surveys attacks and defenses across agentic LLM systems, covering prompt injection, insecure tool use, and excessive agency risks

Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF
defense arXiv Mar 10, 2026 · 27d ago

Execution Is the New Attack Surface: Survivability-Aware Agentic Crypto Trading with OpenClaw-Style Local Executors

Ailiya Borjigin, Igor Stadnyk, Ben Bilski et al. · True Trading · Inc4.net

Execution-layer middleware enforcing non-bypassable invariants on LLM agent tool calls in crypto trading to prevent manipulation-induced financial losses

Insecure Plugin Design Excessive Agency nlpreinforcement-learning
PDF
attack arXiv Mar 10, 2026 · 27d ago

Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

Nanzi Yang, Weiheng Bai, Kangjie Lu · University of Minnesota

Systematically exploits MCP SDK non-compliance vulnerabilities to launch silent prompt injection and DoS attacks against LLM agents

Insecure Plugin Design Prompt Injection nlp
PDF
defense arXiv Mar 9, 2026 · 28d ago

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration

Jianshu She · MBZUAI

Defends enterprise LLM agents against data leakage by splitting sensitive handling from cloud reasoning with context-aware sanitization

Sensitive Information Disclosure Insecure Plugin Design nlp
PDF
survey arXiv Mar 8, 2026 · 29d ago

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Xiaolei Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +5 more

Surveys LLM agent security threats across three autonomy tiers: cognitive manipulation, tool misuse, and multi-agent systemic failures

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
benchmark arXiv Mar 8, 2026 · 29d ago

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Yuhang Huang, Boyang Ma, Biwei Yan et al. · Shandong University · City University of Hong Kong

Large-scale empirical analysis reveals MCP servers fail to authenticate callers, enabling unauthorized tool access in LLM agent systems

Insecure Plugin Design nlp
PDF
defense arXiv Mar 7, 2026 · 4w ago

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Yuxu Ge · University of York

Four-layer governance framework defends LLM agents against prompt injection, RAG poisoning, and malicious plugins with 96% interception rate

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
Loading more papers…