GPTZero: Robust Detection of LLM-Generated Texts

While historical considerations surrounding text authenticity revolved primarily around plagiarism, the advent of large language models (LLMs) has introduced a new challenge: distinguishing human-authored from AI-generated text. This shift raises significant concerns, including the undermining of skill evaluations, the mass-production of low-quality content, and the proliferation of misinformation. Addressing these issues, we introduce GPTZero a state-of-the-art industrial AI detection solution, offering reliable discernment between human and LLM-generated text. Our key contributions include: introducing a hierarchical, multi-task architecture enabling a flexible taxonomy of human and AI texts, demonstrating state-of-the-art accuracy on a variety of domains with granular predictions, and achieving superior robustness to adversarial attacks and paraphrasing via multi-tiered automated red teaming. GPTZero offers accurate and explainable detection, and educates users on its responsible use, ensuring fair and transparent assessment of text.

Key Contributions

Hierarchical, multi-task architecture enabling a flexible taxonomy for classifying human vs. AI-generated text at multiple granularities
State-of-the-art detection accuracy across diverse domains with granular, explainable predictions
Superior robustness to adversarial attacks and paraphrasing achieved through multi-tiered automated red teaming

🛡️ Threat Analysis

Output Integrity Attack

AI-generated text detection is a core ML09 concern — verifying whether content is human-authored or AI-generated directly addresses output integrity and content provenance. The paper also evaluates robustness against adversarial attacks (paraphrasing, red-teaming) targeting the detector.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Diversity Boosts AI-Generated Text Detection

Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues

Detecting LLM-Generated Text with Performance Guarantees

On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Multi-Hierarchical Feature Detection for Large Language Model Generated Text

SOGPTSpotter: Detecting ChatGPT-Generated Answers on Stack Overflow