Reasoning-Aware AIGC Detection via Alignment and Reinforcement
Zhao Wang 1, Max Xiong 2, Jianxun Lian 3, Zhicheng Dou 1
Published on arXiv
2604.19172
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves state-of-the-art performance across multiple benchmarks with interpretable reasoning chains, outperforming black-box detectors and general-purpose LLMs
REVEAL
Novel technique introduced
The rapid advancement and widespread adoption of Large Language Models (LLMs) have elevated the need for reliable AI-generated content (AIGC) detection, which remains challenging as models evolve. We introduce AIGC-text-bank, a comprehensive multi-domain dataset with diverse LLM sources and authorship scenarios, and propose REVEAL, a detection framework that generates interpretable reasoning chains before classification. Our approach uses a two-stage training strategy: supervised fine-tuning to establish reasoning capabilities, followed by reinforcement learning to improve accuracy, improve logical consistency, and reduce hallucinations. Extensive experiments show that REVEAL achieves state-of-the-art performance across multiple benchmarks, offering a robust and transparent solution for AIGC detection. The project is open-source at https://aka.ms/reveal
Key Contributions
- AIGC-text-bank: large-scale multi-domain dataset with 66K human samples and 1.4M AI-generated samples from 12 LLMs, including AI-Native and AI-Polish scenarios
- REVEAL framework: two-stage training (SFT + RL) that generates interpretable reasoning chains before classification
- State-of-the-art detection performance with transparent, reasoning-based decisions across multiple benchmarks
🛡️ Threat Analysis
The paper addresses AI-generated content detection — specifically detecting whether text was written by humans, fully AI-generated (AI-Native), or AI-polished. This is content provenance and authenticity verification, which is core ML09.