attack 2025

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Kanghua Mo ¹, Li Hu ², Yucheng Long ¹, Zhihao Li ¹

¹ Guangzhou University

² The Hong Kong Polytechnic University

0 citations

Published on arXiv

2508.02110

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

AMA achieves 81–95% attack success rates across ten tool-calling scenarios and four representative LLM agents while causing significant privacy leakage and evading prompt-level defenses, auditor-based detection, and MCP protocols.

Attractive Metadata Attack (AMA)

Novel technique introduced

Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface, where adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. The proposed attack integrates seamlessly into standard tool ecosystems and requires no modification to the agent's execution framework. Extensive experiments across ten realistic, simulated tool-use scenarios and a range of popular LLM agents demonstrate consistently high attack success rates (81\%-95\%) and significant privacy leakage, with negligible impact on primary task execution. Moreover, the attack remains effective even against prompt-level defenses, auditor-based detection, and structured tool-selection protocols such as the Model Context Protocol, revealing systemic vulnerabilities in current agent architectures. These findings reveal that metadata manipulation constitutes a potent and stealthy attack surface. Notably, AMA is orthogonal to injection attacks and can be combined with them to achieve stronger attack efficacy, highlighting the need for execution-level defenses beyond prompt-level and auditor-based mechanisms. Code is available at https://github.com/SEAIC-M/AMA.

Key Contributions

Introduces the Attractive Metadata Attack (AMA), the first attack that manipulates tool metadata (name, description, parameter schema) — rather than prompts or model internals — to induce LLM agents to preferentially invoke malicious tools.
Formulates metadata crafting as a state-action-value optimization problem using iterative in-context learning with three supporting mechanisms: generation traceability, weighted value evaluation, and batch generation.
Demonstrates 81–95% attack success rates across ten tool-use scenarios and four LLM agents, with effectiveness maintained against prompt-level defenses, auditor-based detection, and MCP-structured protocols.

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

Ten realistic simulated tool-use scenarios (custom)

Applications

llm agentstool-augmented ai systemsmcp-based agent frameworks

Read PDF arXiv Code

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Takedown: How It's Done in Modern Coding Agent Exploits

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search