Overthinking Loops in Agents: A Structural Risk via MCP Tools
Yohan Lee 1, Jisoo Jang 2, Seoyeon Choi 2, Sangyeop Kim 3, Seungtaek Choi 2
Published on arXiv
2602.14798
Model Denial of Service
OWASP LLM Top 10 — LLM04
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
Malicious MCP tools induce cyclic agent trajectories causing up to 142.4× token inflation across multiple tool-capable LLMs, with decoding-time concision controls unable to prevent loop induction.
Structural Overthinking Attack
Novel technique introduced
Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages. We show that this convenience creates a supply-chain attack surface: a malicious MCP tool server can be co-registered alongside normal tools and induce overthinking loops, where individually trivial or plausible tool calls compose into cyclic trajectories that inflate end-to-end tokens and latency without any single step looking abnormal. We formalize this as a structural overthinking attack, distinguishable from token-level verbosity, and implement 14 malicious tools across three servers that trigger repetition, forced refinement, and distraction. Across heterogeneous registries and multiple tool-capable models, the attack causes severe resource amplification (up to $142.4\times$ tokens) and can degrade task outcomes. Finally, we find that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone.
Key Contributions
- Formalizes 'structural overthinking attacks' — cyclic tool-call trajectories in LLM agents that inflate tokens/latency without any single step appearing anomalous
- Implements 14 malicious MCP tools across 3 servers embodying three loop strategies (repetition, forced refinement, distraction), achieving up to 142.4× token amplification
- Demonstrates that decoding-time concision controls (e.g., verbosity penalties) are insufficient defenses, motivating tool-call-structure-aware mitigations