defense 2026

When AI Settles Down: Late-Stage Stability as a Signature of AI-Generated Text Detection

Ke Sun , Guangsheng Bao , Han Cui , Yue Zhang

1 citations · 24 references · arXiv

α

Published on arXiv

2601.04833

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art zero-shot AI text detection on EvoBench and MAGE by exploiting 24–32% lower log-probability volatility in the late stage of AI-generated sequences versus human text.

Late-Stage Volatility Decay (Derivative Dispersion + Local Volatility)

Novel technique introduced


Zero-shot detection methods for AI-generated text typically aggregate token-level statistics across entire sequences, overlooking the temporal dynamics inherent to autoregressive generation. We analyze over 120k text samples and reveal Late-Stage Volatility Decay: AI-generated text exhibits rapidly stabilizing log probability fluctuations as generation progresses, while human writing maintains higher variability throughout. This divergence peaks in the second half of sequences, where AI-generated text shows 24--32\% lower volatility. Based on this finding, we propose two simple features: Derivative Dispersion and Local Volatility, which computed exclusively from late-stage statistics. Without perturbation sampling or additional model access, our method achieves state-of-the-art performance on EvoBench and MAGE benchmarks and demonstrates strong complementarity with existing global methods.


Key Contributions

  • Discovery of Late-Stage Volatility Decay: AI-generated text exhibits 24–32% lower log-probability volatility in the second half of sequences compared to human text
  • Two zero-shot detection features — Derivative Dispersion and Local Volatility — computed exclusively from late-stage token statistics without perturbation sampling or additional model access
  • State-of-the-art detection performance on EvoBench and MAGE benchmarks with strong complementarity to existing global methods

🛡️ Threat Analysis

Output Integrity Attack

Proposes novel AI-generated text detection features (Derivative Dispersion, Local Volatility) that identify whether text was produced by an LLM — directly addressing output integrity and content authenticity.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
EvoBenchMAGE
Applications
ai-generated text detectionllm output attribution