MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds

The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at https://github.com/creator-xi/MoSEs.

Key Contributions

MoSEs framework with three components (SRR, SAR, CTE) enabling stylistics-aware uncertainty quantification for AI-generated text detection
Conditional Threshold Estimator that jointly models linguistic statistical properties and semantic features to dynamically determine optimal detection thresholds
Significant gains over baselines: +11.34% average and +39.15% in low-resource settings

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated text detection framework (MoSEs) — detecting whether text was produced by an LLM is a core ML09 output integrity / content provenance task. The contribution is a new detection architecture, not a mere application of existing methods to a new domain.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2025 0 cit.

Output Integrity Attack

100%