tool 2025

Diversity Boosts AI-Generated Text Detection

Advik Raj Basani 1, Pin-Yu Chen 2

4 citations · 1 influential · 108 references · arXiv

α

Published on arXiv

2509.18880

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Outperforms existing zero-shot detectors by up to 33.2% and improves performance of existing detectors by up to 18.7% when used as an auxiliary signal, while remaining robust to paraphrasing and adversarial attacks.

DivEye

Novel technique introduced


Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media, where synthetic fluency can mask misinformation or deception. While prior detectors often rely on token-level likelihoods or opaque black-box classifiers, these approaches struggle against high-quality generations and offer little interpretability. In this work, we propose DivEye, a novel detection framework that captures how unpredictability fluctuates across a text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM outputs, DivEye captures this signal through a set of interpretable statistical features. Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines across multiple benchmarks. DivEye is robust to paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to 18.7% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why a text is flagged, pointing to rhythmic unpredictability as a powerful and underexplored signal for LLM detection.


Key Contributions

  • DivEye: a zero-shot detector using interpretable surprisal-based diversity features that capture lexical/structural unpredictability fluctuations distinguishing human from LLM-generated text
  • Language- and model-agnostic detection requiring no access to the generator model, generalizing across domains, languages, and model families
  • Complementary signal that boosts existing detectors by up to 18.7% when used as an auxiliary feature, with demonstrated robustness to paraphrasing and adversarial attacks

🛡️ Threat Analysis

Output Integrity Attack

DivEye is a novel AI-generated text detector that verifies content authenticity and provenance — directly targeting output integrity. It proposes new surprisal-based diversity features as a forensic signal, which squarely fits ML09's AI-generated content detection scope.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
multiple benchmarks (unspecified in excerpt)
Applications
ai-generated text detectionacademic integrityjournalismsocial media content moderationbusiness compliance