attack 2025

Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space

Xingfu Zhou , Pengfei Wang

National University of Defense Technology

2 citations · 16 references · arXiv

Published on arXiv

2512.14448

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

GSI increases agent reasoning steps by up to 4.4× or induces premature errors across ReAct, Reflection, and Tree-of-Thoughts architectures while successfully evading state-of-the-art content and instruction-injection filters.

Generative Style Injection (GSI)

Novel technique introduced

Large Language Model (LLM) agents relying on external retrieval are increasingly deployed in high-stakes environments. While existing adversarial attacks primarily focus on content falsification or instruction injection, we identify a novel, process-oriented attack surface: the agent's reasoning style. We propose Reasoning-Style Poisoning (RSP), a paradigm that manipulates how agents process information rather than what they process. We introduce Generative Style Injection (GSI), an attack method that rewrites retrieved documents into pathological tones--specifically "analysis paralysis" or "cognitive haste"--without altering underlying facts or using explicit triggers. To quantify these shifts, we develop the Reasoning Style Vector (RSV), a metric tracking Verification depth, Self-confidence, and Attention focus. Experiments on HotpotQA and FEVER using ReAct, Reflection, and Tree of Thoughts (ToT) architectures reveal that GSI significantly degrades performance. It increases reasoning steps by up to 4.4 times or induces premature errors, successfully bypassing state-of-the-art content filters. Finally, we propose RSP-M, a lightweight runtime monitor that calculates RSV metrics in real-time and triggers alerts when values exceed safety thresholds. Our work demonstrates that reasoning style is a distinct, exploitable vulnerability, necessitating process-level defenses beyond static content analysis.

Key Contributions

Reasoning-Style Poisoning (RSP): a novel process-level attack paradigm that manipulates *how* LLM agents reason (verification depth, self-confidence, attention focus) by injecting pathological epistemic tones into retrieved documents without altering facts or using explicit triggers.
Generative Style Injection (GSI): an adversarial style-transfer attack that rewrites retrieval-corpus documents into 'analysis paralysis' or 'cognitive haste' styles, increasing reasoning steps by up to 4.4× or inducing premature errors while bypassing state-of-the-art content and instruction detectors (e.g., PIGuard).
RSP-M: a lightweight runtime monitor that tracks the Reasoning Style Vector (RSV) across agent traces in real-time and raises alerts when style drift exceeds safety thresholds, providing process-level defense beyond static input filtering.

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

HotpotQAFEVER

Applications

rag-based llm agentsmulti-hop question answeringautonomous agent pipelines

Read PDF arXiv DOI

Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Anecdoctoring: Automated Red-Teaming Across Language and Place

PUZZLED: Jailbreaking LLMs through Word-Based Puzzles

Boundary Point Jailbreaking of Black-Box LLMs

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation