attack 2026

ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

Xiaogeng Liu 1,2, Xinyan Wang 3, Yechao Zhang 4, Sanjay Kariyappa 2, Chong Xiang 2, Muhao Chen 5, G. Edward Suh 2,6, Chaowei Xiao 1,2

0 citations · 53 references · arXiv

α

Published on arXiv

2602.00154

Model Denial of Service

OWASP LLM Top 10 — LLM04

Key Finding

ReasoningBomb induces 286.7x input-to-output token amplification on average, outperforms the best baseline by 38% in reasoning tokens, and evades detection with >98.4% bypass rate against dual-stage joint detection.

ReasoningBomb

Novel technique introduced


Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service (PI-DoS) attacks that exploit the high computational cost of reasoning. We first formalize inference cost for LRMs and define PI-DoS, then prove that any practical PI-DoS attack should satisfy three properties: (1) a high amplification ratio, where each query induces a disproportionately long reasoning trace relative to its own length; (ii) stealthiness, in which prompts and responses remain on the natural language manifold and evade distribution shift detectors; and (iii) optimizability, in which the attack supports efficient optimization without being slowed by its own success. Under this framework, we present ReasoningBomb, a reinforcement-learning-based PI-DoS framework that is guided by a constant-time surrogate reward and trains a large reasoning-model attacker to generate short natural prompts that drive victim LRMs into pathologically long and often effectively non-terminating reasoning. Across seven open-source models (including LLMs and LRMs) and three commercial LRMs, ReasoningBomb induces 18,759 completion tokens on average and 19,263 reasoning tokens on average across reasoning models. It outperforms the the runner-up baseline by 35% in completion tokens and 38% in reasoning tokens, while inducing 6-7x more tokens than benign queries and achieving 286.7x input-to-output amplification ratio averaged across all samples. Additionally, our method achieves 99.8% bypass rate on input-based detection, 98.7% on output-based detection, and 98.4% against strict dual-stage joint detection.


Key Contributions

  • Formal characterization of PI-DoS attacks via three necessary properties: amplification ratio, stealthiness, and optimizability
  • ReasoningBomb: two-stage SFT + GRPO-based RL framework using a constant-time MLP surrogate reward (from victim hidden states) and a diversity reward to train a short-prompt attacker
  • Empirical demonstration across 10 LRMs achieving 18,759 avg completion tokens, 286.7x amplification, and >98% bypass rate against input-, output-, and dual-stage detectors

🛡️ Threat Analysis


Details

Domains
nlpreinforcement-learning
Model Types
llmtransformerrl
Threat Tags
grey_boxblack_boxinference_time
Applications
large reasoning model inference serversllm api services