defense 2026

Variation is the Key: A Variation-Based Framework for LLM-Generated Text Detection

Xuecong Li , Xiaohong Li , Qiang Hu , Yao Zhang , Junjie Wang

0 citations · 24 references · arXiv (Cornell University)

α

Published on arXiv

2602.13226

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

VaryBalance outperforms state-of-the-art Binoculars by up to 34.5% overall AUROC on formal writing contexts across eight datasets and five LLMs

VaryBalance

Novel technique introduced


Detecting text generated by large language models (LLMs) is crucial but challenging. Existing detectors depend on impractical assumptions, such as white-box settings, or solely rely on text-level features, leading to imprecise detection ability. In this paper, we propose a simple but effective and practical LLM-generated text detection method, VaryBalance. The core of VaryBalance is that, compared to LLM-generated texts, there is a greater difference between human texts and their rewritten version via LLMs. Leveraging this observation, VaryBalance quantifies this through mean standard deviation and distinguishes human texts and LLM-generated texts. Comprehensive experiments demonstrated that VaryBalance outperforms the state-of-the-art detectors, i.e., Binoculars, by up to 34.3\% in terms of AUROC, and maintains robustness against multiple generating models and languages.


Key Contributions

  • Empirical observation that human texts exhibit greater log-perplexity variation across LLM rewrites than LLM-generated texts do
  • VaryBalance: a black-box detector that uses mean standard deviation of rewritten-text log perplexities to distinguish human vs. LLM-generated text
  • Extended scoring variant for short or stylistically diverse social media text; outperforms Binoculars by up to 34.3% AUROC across eight datasets and five models

🛡️ Threat Analysis

Output Integrity Attack

VaryBalance is an AI-generated content detector that distinguishes human-written from LLM-generated text — directly addressing output integrity and content provenance, the core concern of ML09.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Datasets
eight datasets (formal writing and social media, names not fully specified in excerpt)
Applications
llm-generated text detectionacademic integritymisinformation detectionsocial media content moderation