Reasoning-Aware AIGC Detection via Alignment and Reinforcement

The rapid advancement and widespread adoption of Large Language Models (LLMs) have elevated the need for reliable AI-generated content (AIGC) detection, which remains challenging as models evolve. We introduce AIGC-text-bank, a comprehensive multi-domain dataset with diverse LLM sources and authorship scenarios, and propose REVEAL, a detection framework that generates interpretable reasoning chains before classification. Our approach uses a two-stage training strategy: supervised fine-tuning to establish reasoning capabilities, followed by reinforcement learning to improve accuracy, improve logical consistency, and reduce hallucinations. Extensive experiments show that REVEAL achieves state-of-the-art performance across multiple benchmarks, offering a robust and transparent solution for AIGC detection. The project is open-source at https://aka.ms/reveal

Key Contributions

AIGC-text-bank: large-scale multi-domain dataset with 66K human samples and 1.4M AI-generated samples from 12 LLMs, including AI-Native and AI-Polish scenarios
REVEAL framework: two-stage training (SFT + RL) that generates interpretable reasoning chains before classification
State-of-the-art detection performance with transparent, reasoning-based decisions across multiple benchmarks

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses AI-generated content detection — specifically detecting whether text was written by humans, fully AI-generated (AI-Native), or AI-polished. This is content provenance and authenticity verification, which is core ML09.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

AIGC-text-bankM4LOKI

Applications

2026 0 cit.

Output Integrity Attack

90%