Diversity Boosts AI-Generated Text Detection

Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media, where synthetic fluency can mask misinformation or deception. While prior detectors often rely on token-level likelihoods or opaque black-box classifiers, these approaches struggle against high-quality generations and offer little interpretability. In this work, we propose DivEye, a novel detection framework that captures how unpredictability fluctuates across a text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM outputs, DivEye captures this signal through a set of interpretable statistical features. Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines across multiple benchmarks. DivEye is robust to paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to 18.7% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why a text is flagged, pointing to rhythmic unpredictability as a powerful and underexplored signal for LLM detection.

Key Contributions

DivEye: a zero-shot detector using interpretable surprisal-based diversity features that capture lexical/structural unpredictability fluctuations distinguishing human from LLM-generated text
Language- and model-agnostic detection requiring no access to the generator model, generalizing across domains, languages, and model families
Complementary signal that boosts existing detectors by up to 18.7% when used as an auxiliary feature, with demonstrated robustness to paraphrasing and adversarial attacks

🛡️ Threat Analysis

Output Integrity Attack

DivEye is a novel AI-generated text detector that verifies content authenticity and provenance — directly targeting output integrity. It proposes new surprisal-based diversity features as a forensic signal, which squarely fits ML09's AI-generated content detection scope.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

multiple benchmarks (unspecified in excerpt)

Applications

2026 0 cit.

Output Integrity Attack

90%