ML Security Papers

Latest papers

137 papers

survey arXiv Apr 30, 2026 · 21d ago

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen · Nantong University · Nanjing University

Layered security review of LLM agent frameworks covering prompt injection, tool misuse, state persistence attacks, and ecosystem vulnerabilities

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF

defense arXiv Apr 29, 2026 · 22d ago

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Hung Dang · Van Lang University

Stateful behavioral firewall for LLM agents using compiled benign traces to block context-sequential tool-call attacks

Insecure Plugin Design Excessive Agency nlp

PDF

benchmark arXiv Apr 27, 2026 · 24d ago

A Comparative Evaluation of AI Agent Security Guardrails

Qi Li, Jiu Li, Pingtao Wei et al. · Beijing Caizhi Tech

Benchmarks four commercial AI agent security guardrails on detecting prompt injection, instruction override, and harmful content requests

Prompt Injection Insecure Plugin Design nlp

PDF

survey arXiv Apr 25, 2026 · 26d ago

From Stateless Queries to Autonomous Actions: A Layered Security Framework for Agentic AI Systems

Kexin Chu · University of Connecticut

Surveys 94 papers on agentic AI security, proposing a seven-layer architectural framework and temporal attack taxonomy

Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal

PDF

defense arXiv Apr 25, 2026 · 26d ago

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

Yuandao Cai, Wensheng Tang, Cheng Wen et al. · The Hong Kong University of Science and Technology · Xidian University

Taint tracking framework that detects malicious data flows in LLM agents from untrusted sources to privileged actions

Prompt Injection Insecure Plugin Design Blue-Team Agents nlp

PDF

tool arXiv Apr 23, 2026 · 28d ago

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Run Hao, Zhuoran Tan · Aarhus University · University of Glasgow

Security testing framework for MCP tool servers detecting developer pitfalls through static analysis and trace-based validation

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection Benchmarks & Evaluation Blue-Team Agents multimodalnlp

PDF

attack arXiv Apr 22, 2026 · 29d ago

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis et al. · IBM Research Europe · Trinity College Dublin +1 more

Gradient-based adversarial attack that hijacks LLM function calling by inserting optimized tokens into function descriptions to force invocation of attacker-chosen tools

Input Manipulation Attack Insecure Plugin Design Excessive Agency nlp

PDF

defense arXiv Apr 20, 2026 · 4w ago

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

Xiangyu Wen, Yuang Zhao, Xiaoyu Xu et al. · The Chinese University of Hong Kong · Shanghai Jiao Tong University +3 more

Kernel-based security architecture for LLM agents that intercepts unsafe tool calls using deterministic taint tracking and dependency graphs

Insecure Plugin Design Excessive Agency nlp

PDF Code

defense arXiv Apr 20, 2026 · 4w ago

AgenTEE: Confidential LLM Agent Execution on Edge Devices

Sina Abdollahi, Mohammad M Maheri, Javad Forough et al. · Imperial College London · Dartmouth College

Secure LLM agent deployment system using Arm confidential VMs to isolate runtime, inference, and plugins on edge devices

AI Supply Chain Attacks Insecure Plugin Design Excessive Agency nlp

PDF

defense arXiv Apr 19, 2026 · 4w ago

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni et al. · Lomonosov Moscow State University · Central University

Runtime security architecture defending LLM agents against prompt injection by mediating tool-use actions with stateful risk reasoning

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF

defense arXiv Apr 18, 2026 · 4w ago

CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems

İpek Abasıkeleş Turgut, Edip Gümüş · Iskenderun Technical University

Three-layer defense system detecting prompt injection and tool poisoning in MCP-based LLM applications using local embeddings and pattern analysis

Prompt Injection Insecure Plugin Design nlp

PDF

defense arXiv Apr 18, 2026 · 4w ago

CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution

Shutong Jin, Ruiyi Guo, Ray C. C. Cheung · City University of Hong Kong · Beijing Foreign Studies University

Broker-mediated capability system that prevents AI agents from directly accessing secrets, defending against prompt injection exfiltration

Prompt Injection Insecure Plugin Design nlp

PDF

attack arXiv Apr 17, 2026 · 4w ago

Conjunctive Prompt Attacks in Multi-Agent LLM Systems

Nokimul Hasan Arif, Qian Lou, Mengxin Zheng · University of Central Florida

Conjunctive prompt injection attack on multi-agent LLM systems that splits malicious payload across user query and compromised remote agent

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF Code

defense arXiv Apr 15, 2026 · 5w ago

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen et al. · Chinese Academy of Sciences · Institute of Applied Physics and Computational Mathematics +1 more

Multi-layer security architecture embedded in LLM agent execution harnesses to defend against prompt injection and tool misuse attacks

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF

defense arXiv Apr 13, 2026 · 5w ago

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

Wei Zhao, Zhe Li, Peixin Zhang et al. · Singapore Management University

Runtime framework enforcing user-confirmed rules at tool-call boundaries to block indirect prompt injection across web, MCP, and skill channels

Prompt Injection Insecure Plugin Design nlp

PDF Code

defense arXiv Apr 11, 2026 · 5w ago

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Guijia Zhang, Shu Yang, Xilin Gong et al. · Shenzhen University · King Abdullah University of Science & Technology +2 more

Runtime risk-scoring system for LLM agent tool calls that detects indirect prompt injection attacks before execution

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF Code

attack arXiv Apr 9, 2026 · 6w ago

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Hanzhi Liu, Chaofan Shou, Hongbo Wen et al. · University of California · Fuzzland +1 more

Malicious LLM API routers inject code into tool calls and steal credentials from agent frameworks in the wild

AI Supply Chain Attacks Insecure Plugin Design Sensitive Information Disclosure nlp

PDF

benchmark arXiv Apr 8, 2026 · 6w ago

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan et al. · Old Dominion University

Security taxonomy for MCP-based LLM agents mapping threats across six architectural layers and revealing defense gaps in orchestration and supply chain

Insecure Plugin Design Excessive Agency nlp

PDF

benchmark arXiv Apr 8, 2026 · 6w ago

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang et al. · CyCraft · National Taiwan University

Benchmark evaluating LLM safety guardrails on multi-step agent tool-calling trajectories across 12 risk categories including prompt injection

Prompt Injection Insecure Plugin Design Excessive Agency Benchmarks & Evaluation Blue-Team Agents nlp

PDF

defense arXiv Apr 8, 2026 · 6w ago

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia et al. · The Pennsylvania State University

Prevents LLM tool poisoning by auto-generating trusted tool descriptions from source code via static analysis and dynamic verification

Prompt Injection Insecure Plugin Design nlp

PDF

Loading more papers…

Latest papers

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

A Comparative Evaluation of AI Agent Security Guardrails

From Stateless Queries to Autonomous Actions: A Layered Security Framework for Agentic AI Systems

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

AgenTEE: Confidential LLM Agent Execution on Edge Devices

SafeAgent: A Runtime Protection Architecture for Agentic Systems

CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems

CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution

Conjunctive Prompt Attacks in Multi-Agent LLM Systems

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue