tool 2026

Reasoning-Aware AIGC Detection via Alignment and Reinforcement

Zhao Wang 1, Max Xiong 2, Jianxun Lian 3, Zhicheng Dou 1

0 citations

α

Published on arXiv

2604.19172

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art performance across multiple benchmarks with interpretable reasoning chains, outperforming black-box detectors and general-purpose LLMs

REVEAL

Novel technique introduced


The rapid advancement and widespread adoption of Large Language Models (LLMs) have elevated the need for reliable AI-generated content (AIGC) detection, which remains challenging as models evolve. We introduce AIGC-text-bank, a comprehensive multi-domain dataset with diverse LLM sources and authorship scenarios, and propose REVEAL, a detection framework that generates interpretable reasoning chains before classification. Our approach uses a two-stage training strategy: supervised fine-tuning to establish reasoning capabilities, followed by reinforcement learning to improve accuracy, improve logical consistency, and reduce hallucinations. Extensive experiments show that REVEAL achieves state-of-the-art performance across multiple benchmarks, offering a robust and transparent solution for AIGC detection. The project is open-source at https://aka.ms/reveal


Key Contributions

  • AIGC-text-bank: large-scale multi-domain dataset with 66K human samples and 1.4M AI-generated samples from 12 LLMs, including AI-Native and AI-Polish scenarios
  • REVEAL framework: two-stage training (SFT + RL) that generates interpretable reasoning chains before classification
  • State-of-the-art detection performance with transparent, reasoning-based decisions across multiple benchmarks

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses AI-generated content detection — specifically detecting whether text was written by humans, fully AI-generated (AI-Native), or AI-polished. This is content provenance and authenticity verification, which is core ML09.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Datasets
AIGC-text-bankM4LOKI
Applications
ai-generated content detectionacademic integrityfraud preventionauthorship verification