Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Kanghua Mo 1, Li Hu 2, Yucheng Long 1, Zhihao Li 1
Published on arXiv
2508.02110
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
AMA achieves 81–95% attack success rates across ten tool-calling scenarios and four representative LLM agents while causing significant privacy leakage and evading prompt-level defenses, auditor-based detection, and MCP protocols.
Attractive Metadata Attack (AMA)
Novel technique introduced
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface, where adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. The proposed attack integrates seamlessly into standard tool ecosystems and requires no modification to the agent's execution framework. Extensive experiments across ten realistic, simulated tool-use scenarios and a range of popular LLM agents demonstrate consistently high attack success rates (81\%-95\%) and significant privacy leakage, with negligible impact on primary task execution. Moreover, the attack remains effective even against prompt-level defenses, auditor-based detection, and structured tool-selection protocols such as the Model Context Protocol, revealing systemic vulnerabilities in current agent architectures. These findings reveal that metadata manipulation constitutes a potent and stealthy attack surface. Notably, AMA is orthogonal to injection attacks and can be combined with them to achieve stronger attack efficacy, highlighting the need for execution-level defenses beyond prompt-level and auditor-based mechanisms. Code is available at https://github.com/SEAIC-M/AMA.
Key Contributions
- Introduces the Attractive Metadata Attack (AMA), the first attack that manipulates tool metadata (name, description, parameter schema) — rather than prompts or model internals — to induce LLM agents to preferentially invoke malicious tools.
- Formulates metadata crafting as a state-action-value optimization problem using iterative in-context learning with three supporting mechanisms: generation traceability, weighted value evaluation, and batch generation.
- Demonstrates 81–95% attack success rates across ten tool-use scenarios and four LLM agents, with effectiveness maintained against prompt-level defenses, auditor-based detection, and MCP-structured protocols.