Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective
Chenwang Wu 1, Yiu-ming Cheung 1, Bo Han 1, Defu Lian 2
Published on arXiv
2511.00988
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
The easy-to-hard framework achieves significant detection gains over baselines across diverse practical scenarios including cross-LLM generalization, paraphrase attacks, and mixed human-AI texts
Easy2Hard
Novel technique introduced
Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: https://github.com/tmlr-group/Easy2Hard.
Key Contributions
- Identifies 'boundary ambiguity' in MGT detection showing that standard label-based training paradigms are inherently inexact
- Proposes an easy-to-hard enhancement framework where a supervisor trained on easier (longer-text) tasks structurally lower-bounds and improves a harder target detector
- Theoretically and empirically demonstrates detection improvements across cross-LLM, cross-domain, mixed-text, and paraphrase attack scenarios
🛡️ Threat Analysis
Directly addresses AI-generated content detection — the paper proposes a novel training paradigm (easy-to-hard supervision) to improve the reliability and accuracy of machine-generated text detectors, which falls squarely under output integrity and content authenticity.