attack 2025

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh ^1,2, Ruomei Yan ², Jialin Yu ^1,2, Philip Torr ¹, Yarin Gal ¹, Sunando Sengupta ², Eric Sommerlade ², Alasdair Paren ¹, Adel Bibi ¹

¹ University of Oxford

² Microsoft

6 citations · 1 influential · 46 references · arXiv

Published on arXiv

2510.02554

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

ToolTweak raises a targeted tool's selection rate from a ~20% baseline to as high as 81%, with strong cross-model transferability between open-source and closed-source LLMs.

ToolTweak

Novel technique introduced

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities. These agents typically select tools from growing databases or marketplaces to solve user tasks, creating implicit competition among tool providers and developers for visibility and usage. In this paper, we show that this selection process harbors a critical vulnerability: by iteratively manipulating tool names and descriptions, adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives. We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%, with strong transferability between open-source and closed-source models. Beyond individual tools, we show that such attacks cause distributional shifts in tool usage, revealing risks to fairness, competition, and security in emerging tool ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally. All code will be open-sourced upon acceptance.

Key Contributions

ToolTweak: an iterative, LLM-feedback-based attack that automatically optimizes tool names and descriptions to bias agent tool selection from ~20% to 81% selection rate
Demonstrates distributional shifts in tool usage across tasks, revealing fairness and competition risks in tool marketplaces and MCP ecosystems
Evaluates paraphrasing and perplexity-filtering defenses that reduce selection bias, and shows strong attack transferability between open-source and closed-source LLMs

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

grey_boxblack_boxinference_timetargeted

Applications

llm-based agentstool marketplacesmodel context protocol (mcp) systems

Read PDF arXiv DOI

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

When Skills Lie: Hidden-Comment Injection in LLM Agents