Diversity Boosts AI-Generated Text Detection
Advik Raj Basani 1, Pin-Yu Chen 2
Published on arXiv
2509.18880
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Outperforms existing zero-shot detectors by up to 33.2% and improves performance of existing detectors by up to 18.7% when used as an auxiliary signal, while remaining robust to paraphrasing and adversarial attacks.
DivEye
Novel technique introduced
Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media, where synthetic fluency can mask misinformation or deception. While prior detectors often rely on token-level likelihoods or opaque black-box classifiers, these approaches struggle against high-quality generations and offer little interpretability. In this work, we propose DivEye, a novel detection framework that captures how unpredictability fluctuates across a text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM outputs, DivEye captures this signal through a set of interpretable statistical features. Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines across multiple benchmarks. DivEye is robust to paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to 18.7% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why a text is flagged, pointing to rhythmic unpredictability as a powerful and underexplored signal for LLM detection.
Key Contributions
- DivEye: a zero-shot detector using interpretable surprisal-based diversity features that capture lexical/structural unpredictability fluctuations distinguishing human from LLM-generated text
- Language- and model-agnostic detection requiring no access to the generator model, generalizing across domains, languages, and model families
- Complementary signal that boosts existing detectors by up to 18.7% when used as an auxiliary feature, with demonstrated robustness to paraphrasing and adversarial attacks
🛡️ Threat Analysis
DivEye is a novel AI-generated text detector that verifies content authenticity and provenance — directly targeting output integrity. It proposes new surprisal-based diversity features as a forensic signal, which squarely fits ML09's AI-generated content detection scope.