Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents
Kaiyu Zhou 1, Yongsen Zheng 1, Yicheng He 2, Meng Xue 3, Xueluan Gong 1, Yuji Wang 4, Xuanye Zhang 1, Kwok-Yan Lam 1
1 Nanyang Technological University
2 University of Illinois Urbana-Champaign
Published on arXiv
2601.10955
Model Denial of Service
OWASP LLM Top 10 — LLM04
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
Attack inflates LLM agent costs by up to 658x and energy by 100–560x across six LLMs by extending task trajectories beyond 60,000 tokens through stealthy MCP tool server manipulation, while keeping final answers correct to evade validation.
MCTS-optimized Tool Response Manipulation
Novel technique introduced
The agent-tool communication loop is a critical attack surface in modern Large Language Model (LLM) agents. Existing Denial-of-Service (DoS) attacks, primarily triggered via user prompts or injected retrieval-augmented generation (RAG) context, are ineffective for this new paradigm. They are fundamentally single-turn and often lack a task-oriented approach, making them conspicuous in goal-oriented workflows and unable to exploit the compounding costs of multi-turn agent-tool interactions. We introduce a stealthy, multi-turn economic DoS attack that operates at the tool layer under the guise of a correctly completed task. Our method adjusts text-visible fields and a template-governed return policy in a benign, Model Context Protocol (MCP)-compatible tool server, optimizing these edits with a Monte Carlo Tree Search (MCTS) optimizer. These adjustments leave function signatures unchanged and preserve the final payload, steering the agent into prolonged, verbose tool-calling sequences using text-only notices. This compounds costs across turns, escaping single-turn caps while keeping the final answer correct to evade validation. Across six LLMs on the ToolBench and BFCL benchmarks, our attack expands tasks into trajectories exceeding 60,000 tokens, inflates costs by up to 658x, and raises energy by 100-560x. It drives GPU KV cache occupancy from <1% to 35-74% and cuts co-running throughput by approximately 50%. Because the server remains protocol-compatible and task outcomes are correct, conventional checks fail. These results elevate the agent-tool interface to a first-class security frontier, demanding a paradigm shift from validating final answers to monitoring the economic and computational cost of the entire agentic process.
Key Contributions
- First multi-turn stealthy economic DoS attack operating at the tool/MCP layer, exploiting compounding costs across agent-tool dialogue turns rather than single-turn token limits
- MCTS optimizer that adjusts text-visible tool server fields and return policies to maximize token trajectory length while preserving task correctness and protocol compatibility
- Empirical demonstration across six LLMs on ToolBench and BFCL showing 658x cost inflation, 100–560x energy increase, and ~50% throughput reduction while evading conventional output validation