attack 2026

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Zachary Coalson ¹, Bo Fang ², Sanghyun Hong ¹

¹ Oregon State University

² University of Texas at Arlington

0 citations · 60 references · arXiv (Cornell University)

Published on arXiv

2602.17778

Model Poisoning

OWASP ML Top 10 — ML10

Model Denial of Service

OWASP LLM Top 10 — LLM04

Key Finding

Fine-tuning and parameter corruption attacks substantially increase multi-turn interaction counts across instruction-tuned LLMs while remaining task-compliant, with existing defenses offering only limited protection.

Turn Amplification

Novel technique introduced

Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in conversational LLMs: turn amplification, in which a model consistently prolongs multi-turn interactions without completing the underlying task. We show that an adversary can systematically exploit clarification-seeking behavior$-$commonly encouraged in multi-turn conversation settings$-$to scalably prolong interactions. Moving beyond prompt-level behaviors, we take a mechanistic perspective and identify a query-independent, universal activation subspace associated with clarification-seeking responses. Unlike prior cost-amplification attacks that rely on per-turn prompt optimization, our attack arises from conversational dynamics and persists across prompts and tasks. We show that this mechanism provides a scalable pathway to induce turn amplification: both supply-chain attacks via fine-tuning and runtime attacks through low-level parameter corruptions consistently shift models toward abstract, clarification-seeking behavior across prompts. Across multiple instruction-tuned LLMs and benchmarks, our attack substantially increases turn count while remaining compliant. We also show that existing defenses offer limited protection against this emerging class of failures.

Key Contributions

Identifies 'turn amplification' as a novel failure mode in conversational LLMs in which adversaries exploit clarification-seeking dynamics to scalably inflate multi-turn operational costs
Mechanistically identifies a query-independent, universal activation subspace associated with clarification-seeking responses that persists across prompts and tasks
Demonstrates two attack vectors — supply-chain attacks via fine-tuning and runtime attacks via low-level parameter corruption — that persistently induce turn amplification while maintaining apparent compliance

🛡️ Threat Analysis

Model Poisoning

The paper demonstrates fine-tuning (supply-chain) and runtime parameter corruption attacks that embed persistent clarification-seeking behavior directly in model weights, constituting model weight manipulation to introduce malicious behavior. Although the induced behavior is general rather than trigger-activated, the attack mechanism is direct weight/parameter manipulation to embed targeted malicious behavior rather than prompt-level exploitation.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxgrey_boxtraining_timeinference_timetargeted

Applications

conversational aiinstruction-tuned llmsmulti-turn dialogue systems

Read PDF arXiv DOI

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning

TFL: Targeted Bit-Flip Attack on Large Language Model

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

SASER: Stego attacks on open-source LLMs

Ghosting Your LLM: Without The Knowledge of Your Gradient and Data