attack 2026

Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents

Kaiyu Zhou 1, Yongsen Zheng 1, Yicheng He 2, Meng Xue 3, Xueluan Gong 1, Yuji Wang 4, Xuanye Zhang 1, Kwok-Yan Lam 1

2 citations · 1 influential · 74 references · arXiv

α

Published on arXiv

2601.10955

Model Denial of Service

OWASP LLM Top 10 — LLM04

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Attack inflates LLM agent costs by up to 658x and energy by 100–560x across six LLMs by extending task trajectories beyond 60,000 tokens through stealthy MCP tool server manipulation, while keeping final answers correct to evade validation.

MCTS-optimized Tool Response Manipulation

Novel technique introduced


The agent-tool communication loop is a critical attack surface in modern Large Language Model (LLM) agents. Existing Denial-of-Service (DoS) attacks, primarily triggered via user prompts or injected retrieval-augmented generation (RAG) context, are ineffective for this new paradigm. They are fundamentally single-turn and often lack a task-oriented approach, making them conspicuous in goal-oriented workflows and unable to exploit the compounding costs of multi-turn agent-tool interactions. We introduce a stealthy, multi-turn economic DoS attack that operates at the tool layer under the guise of a correctly completed task. Our method adjusts text-visible fields and a template-governed return policy in a benign, Model Context Protocol (MCP)-compatible tool server, optimizing these edits with a Monte Carlo Tree Search (MCTS) optimizer. These adjustments leave function signatures unchanged and preserve the final payload, steering the agent into prolonged, verbose tool-calling sequences using text-only notices. This compounds costs across turns, escaping single-turn caps while keeping the final answer correct to evade validation. Across six LLMs on the ToolBench and BFCL benchmarks, our attack expands tasks into trajectories exceeding 60,000 tokens, inflates costs by up to 658x, and raises energy by 100-560x. It drives GPU KV cache occupancy from <1% to 35-74% and cuts co-running throughput by approximately 50%. Because the server remains protocol-compatible and task outcomes are correct, conventional checks fail. These results elevate the agent-tool interface to a first-class security frontier, demanding a paradigm shift from validating final answers to monitoring the economic and computational cost of the entire agentic process.


Key Contributions

  • First multi-turn stealthy economic DoS attack operating at the tool/MCP layer, exploiting compounding costs across agent-tool dialogue turns rather than single-turn token limits
  • MCTS optimizer that adjusts text-visible tool server fields and return policies to maximize token trajectory length while preserving task correctness and protocol compatibility
  • Empirical demonstration across six LLMs on ToolBench and BFCL showing 658x cost inflation, 100–560x energy increase, and ~50% throughput reduction while evading conventional output validation

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
ToolBenchBFCL
Applications
llm agentstool-augmented ai systemsmcp-based agentic workflowsapi-billed llm deployments