benchmark 2026

MalTool: Malicious Tool Attacks on LLM Agents

Yuepeng Hu ¹, Yuqi Jia ¹, Mengyuan Li ¹, Dawn Song ², Neil Zhenqiang Gong ¹

¹ Duke University

² UC Berkeley

0 citations · 39 references · arXiv (Cornell University)

Published on arXiv

2602.12194

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

MalTool successfully generates functionally correct malicious tools that bypass safety alignment in coding LLMs, and existing detection methods including VirusTotal show limited effectiveness against them, revealing a critical gap in LLM agent tool security

MalTool

Novel technique introduced

In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user installs the tool and the LLM agent selects it during task execution, the tool can compromise the user's security and privacy. Prior work primarily focuses on manipulating tool names and descriptions to increase the likelihood of installation by users and selection by LLM agents. However, a successful attack also requires embedding malicious behaviors in the tool's code implementation, which remains largely unexplored. In this work, we bridge this gap by presenting the first systematic study of malicious tool code implementations. We first propose a taxonomy of malicious tool behaviors based on the confidentiality-integrity-availability triad, tailored to LLM-agent settings. To investigate the severity of the risks posed by attackers exploiting coding LLMs to automatically generate malicious tools, we develop MalTool, a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors, either as standalone tools or embedded within otherwise benign implementations. To ensure functional correctness and structural diversity, MalTool leverages an automated verifier that validates whether generated tools exhibit the intended malicious behaviors and differ sufficiently from prior instances, iteratively refining generations until success. Our evaluation demonstrates that MalTool is highly effective even when coding LLMs are safety-aligned. Using MalTool, we construct two datasets of malicious tools: 1,200 standalone malicious tools and 5,287 real-world tools with embedded malicious behaviors. We further show that existing detection methods, including commercial malware detection approaches such as VirusTotal and methods tailored to the LLM-agent setting, exhibit limited effectiveness at detecting the malicious tools, highlighting an urgent need for new defenses.

Key Contributions

CIA-triad-based taxonomy of malicious tool behaviors tailored to LLM agent settings (confidentiality, integrity, availability violations)
MalTool: a coding-LLM-based framework that synthesizes malicious tools with an automated verifier ensuring functional correctness and structural diversity, effective even against safety-aligned coding LLMs
Two benchmark datasets (1,200 standalone + 5,287 real-world-embedded malicious tools) demonstrating that commercial detectors like VirusTotal and LLM-agent-specific methods have limited effectiveness

🛡️ Threat Analysis

AI Supply Chain Attacks

The attack vector is an attacker uploading malicious tools to a distribution platform for users to install — a supply chain compromise of the LLM agent tool ecosystem before runtime. The paper explicitly frames this as a supply-chain-style attack where malicious artifacts are injected upstream.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

custom benchmark: 1200 standalone malicious toolscustom benchmark: 5287 real-world tools with embedded malicious behaviors

Applications

llm agentstool-augmented ai systemsagent tool distribution platforms

Read PDF arXiv DOI

MalTool: Malicious Tool Attacks on LLM Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems

Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions

Toward Understanding Security Issues in the Model Context Protocol Ecosystem

When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Formal Analysis and Supply Chain Security for Agentic AI Skills

MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers