attack 2025

LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

4 citations · 2 influential · 16 references · arXiv

Published on arXiv

2511.07876

Model Denial of Service

OWASP LLM Top 10 — LLM04

Key Finding

LoopLLM achieves over 90% of the maximum output length across 12 open-source LLMs (vs. ~20% for baselines) and improves black-box transferability by ~40% to commercial models DeepSeek-V3 and Gemini 2.5 Flash.

LoopLLM

Novel technique introduced

As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. Existing attack methods aim to prolong output by delaying the generation of termination symbols. However, as the output grows longer, controlling the termination symbols through input becomes difficult, making these methods less effective. Therefore, we propose LoopLLM, an energy-latency attack framework based on the observation that repetitive generation can trigger low-entropy decoding loops, reliably compelling LLMs to generate until their output limits. LoopLLM introduces (1) a repetition-inducing prompt optimization that exploits autoregressive vulnerabilities to induce repetitive generation, and (2) a token-aligned ensemble optimization that aggregates gradients to improve cross-model transferability. Extensive experiments on 12 open-source and 2 commercial LLMs show that LoopLLM significantly outperforms existing methods, achieving over 90% of the maximum output length, compared to 20% for baselines, and improving transferability by around 40% to DeepSeek-V3 and Gemini 2.5 Flash.

Key Contributions

Repetition-inducing prompt optimization that exploits autoregressive vulnerabilities to lock LLMs into low-entropy decoding loops, reliably reaching maximum output length
Token-aligned ensemble optimization that aggregates gradients across multiple surrogate models sharing a tokenizer to improve cross-model transferability
Empirical evaluation on 12 open-source and 2 commercial LLMs showing >90% max output length achieved (vs. ~20% for baselines) and ~40% transferability gain to DeepSeek-V3 and Gemini 2.5 Flash

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxblack_boxinference_time

Applications

llm inference systemsllm api services

Read PDF arXiv DOI Code

LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking

POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model

Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning

ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

Clawdrain: Exploiting Tool-Calling Chains for Stealthy Token Exhaustion in OpenClaw Agents