attack 2025

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

Mohan Zhang ¹, Yihua Zhang ², Jinghan Jia ², Zhangyang Wang ³, Sijia Liu ², Tianlong Chen ¹

¹ University of North Carolina at Chapel Hill

² Michigan State University

³ University of Texas at Austin

1 citations · 74 references · arXiv

Published on arXiv

2510.15965

Model Poisoning

OWASP ML Top 10 — ML10

Model Denial of Service

OWASP LLM Top 10 — LLM04

Key Finding

Achieves 100% attack success rate across four advanced LRMs on three math benchmarks, forcing models to exhaust their maximum token budget on every triggered query while remaining stealthy on benign inputs.

Deadlock Attack

Novel technique introduced

Modern large reasoning models (LRMs) exhibit impressive multi-step problem-solving via chain-of-thought (CoT) reasoning. However, this iterative thinking mechanism introduces a new vulnerability surface. We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow by training a malicious adversarial embedding to induce perpetual reasoning loops. Specifically, the optimized embedding encourages transitional tokens (e.g., "Wait", "But") after reasoning steps, preventing the model from concluding its answer. A key challenge we identify is the continuous-to-discrete projection gap: naïve projections of adversarial embeddings to token sequences nullify the attack. To overcome this, we introduce a backdoor implantation strategy, enabling reliable activation through specific trigger tokens. Our method achieves a 100% attack success rate across four advanced LRMs (Phi-RM, Nemotron-Nano, R1-Qwen, R1-Llama) and three math reasoning benchmarks, forcing models to generate up to their maximum token limits. The attack is also stealthy (in terms of causing negligible utility loss on benign user inputs) and remains robust against existing strategies trying to mitigate the overthinking issue. Our findings expose a critical and underexplored security vulnerability in LRMs from the perspective of reasoning (in)efficiency.

Key Contributions

Deadlock Attack: adversarial embedding optimization that induces perpetual chain-of-thought reasoning loops in LRMs, preventing conclusion generation
Identification of the continuous-to-discrete projection gap as the key obstacle to practical deployment, and a backdoor implantation strategy to overcome it via trigger tokens
Empirical demonstration of 100% attack success rate across four LRMs (Phi-RM, Nemotron-Nano, R1-Qwen, R1-Llama) with negligible benign utility loss and robustness against anti-overthinking mitigations

🛡️ Threat Analysis

Model Poisoning

The attack's core mechanism is a backdoor implantation strategy: the model is fine-tuned so that specific trigger tokens reliably activate the deadlock behavior (perpetual reasoning loops). The model behaves normally on benign inputs and maliciously only when the trigger is present — a textbook backdoor/trojan.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_timetargeteddigital

Datasets

MATHAMCAIME

Applications

large reasoning modelsmath reasoning

Read PDF arXiv DOI

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Adversarial Contrastive Learning for LLM Quantization Attacks

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation

SASER: Stego attacks on open-source LLMs