tool 2026

GPTZero: Robust Detection of LLM-Generated Texts

George Alexandru Adam 1, Alexander Cui 1, Edwin Thomas 1, Emily Napier 1, Nazar Shmatko 1, Jacob Schnell 2, Jacob Junqi Tian 3,4, Alekhya Dronavalli , Edward Tian 1, Dongwon Lee 1,5

0 citations · 44 references · arXiv (Cornell University)

α

Published on arXiv

2602.13042

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

GPTZero achieves state-of-the-art accuracy in distinguishing human from LLM-generated text with demonstrated robustness to adversarial paraphrasing attacks

GPTZero

Novel technique introduced


While historical considerations surrounding text authenticity revolved primarily around plagiarism, the advent of large language models (LLMs) has introduced a new challenge: distinguishing human-authored from AI-generated text. This shift raises significant concerns, including the undermining of skill evaluations, the mass-production of low-quality content, and the proliferation of misinformation. Addressing these issues, we introduce GPTZero a state-of-the-art industrial AI detection solution, offering reliable discernment between human and LLM-generated text. Our key contributions include: introducing a hierarchical, multi-task architecture enabling a flexible taxonomy of human and AI texts, demonstrating state-of-the-art accuracy on a variety of domains with granular predictions, and achieving superior robustness to adversarial attacks and paraphrasing via multi-tiered automated red teaming. GPTZero offers accurate and explainable detection, and educates users on its responsible use, ensuring fair and transparent assessment of text.


Key Contributions

  • Hierarchical, multi-task architecture enabling a flexible taxonomy for classifying human vs. AI-generated text at multiple granularities
  • State-of-the-art detection accuracy across diverse domains with granular, explainable predictions
  • Superior robustness to adversarial attacks and paraphrasing achieved through multi-tiered automated red teaming

🛡️ Threat Analysis

Output Integrity Attack

AI-generated text detection is a core ML09 concern — verifying whether content is human-authored or AI-generated directly addresses output integrity and content provenance. The paper also evaluates robustness against adversarial attacks (paraphrasing, red-teaming) targeting the detector.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
ai-generated text detectionacademic integritymisinformation detection