Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning

Enabling large language models (LLMs) to solve complex reasoning tasks is a key step toward artificial general intelligence. Recent work augments LLMs with external tools to enable agentic reasoning, achieving high utility and efficiency in a plug-and-play manner. However, the inherent vulnerabilities of such methods to malicious manipulation of the tool-calling process remain largely unexplored. In this work, we identify a tool-specific attack surface and propose Sponge Tool Attack (STA), which disrupts agentic reasoning solely by rewriting the input prompt under a strict query-only access assumption. Without any modification on the underlying model or the external tools, STA converts originally concise and efficient reasoning trajectories into unnecessarily verbose and convoluted ones before arriving at the final answer. This results in substantial computational overhead while remaining stealthy by preserving the original task semantics and user intent. To achieve this, we design STA as an iterative, multi-agent collaborative framework with explicit rewritten policy control, and generates benign-looking prompt rewrites from the original one with high semantic fidelity. Extensive experiments across 6 models (including both open-source models and closed-source APIs), 12 tools, 4 agentic frameworks, and 13 datasets spanning 5 domains validate the effectiveness of STA.

Key Contributions

Identifies 'Denial-of-Efficiency' (DoE) as a novel, underexplored attack surface in tool-augmented LLM agentic systems
Proposes Sponge Tool Attack (STA), an iterative multi-agent framework (Prompt Rewriter, Quality Judge, Policy Inductor) that rewrites user queries under strict query-only access to induce unnecessarily verbose and expensive reasoning trajectories
Validates STA extensively across 6 models, 12 tools, 4 agentic frameworks, and 13 datasets while demonstrating stealthiness by preserving original task semantics