MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
Junxi Wu 1,2, Jinpeng Wang 2, Zheng Liu 2, Bin Chen 3,4, Dongjian Hu 1, Hao Wu 5, Shu-Tao Xia 2,4
Published on arXiv
2509.02499
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
MoSEs achieves 11.34% average improvement over baselines and 39.15% improvement in low-resource scenarios for AI-generated text detection.
MoSEs (Mixture of Stylistic Experts)
Novel technique introduced
The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at https://github.com/creator-xi/MoSEs.
Key Contributions
- MoSEs framework with three components (SRR, SAR, CTE) enabling stylistics-aware uncertainty quantification for AI-generated text detection
- Conditional Threshold Estimator that jointly models linguistic statistical properties and semantic features to dynamically determine optimal detection thresholds
- Significant gains over baselines: +11.34% average and +39.15% in low-resource settings
🛡️ Threat Analysis
Proposes a novel AI-generated text detection framework (MoSEs) — detecting whether text was produced by an LLM is a core ML09 output integrity / content provenance task. The contribution is a new detection architecture, not a mere application of existing methods to a new domain.