defense 2025

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection

Guoxin Ma 1, Xiaoming Liu 1, Zhanhan Zhang 2, Chengzhengxu Li 1, Shengchao Liu 1, Yu Lan 1

0 citations · 32 references · arXiv

α

Published on arXiv

2511.01192

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves average F1-score improvements of 1.39% in-domain and 5.32% out-of-domain over state-of-the-art MGT detection baselines across ten benchmark datasets.

DEER

Novel technique introduced


Detecting machine-generated text (MGT) has emerged as a critical challenge, driven by the rapid advancement of large language models (LLMs) capable of producing highly realistic, human-like content. However, the performance of current approaches often degrades significantly under domain shift. To address this challenge, we propose a novel framework designed to capture both domain-specific and domain-general MGT patterns through a two-stage Disentangled mixturE-of-ExpeRts (DEER) architecture. First, we introduce a disentangled mixture-of-experts module, in which domain-specific experts learn fine-grained, domain-local distinctions between human and machine-generated text, while shared experts extract transferable, cross-domain features. Second, to mitigate the practical limitation of unavailable domain labels during inference, we design a reinforcement learning-based routing mechanism that dynamically selects the appropriate experts for each input instance, effectively bridging the train-inference gap caused by domain uncertainty. Extensive experiments on five in-domain and five out-of-domain benchmark datasets demonstrate that DEER consistently outperforms state-of-the-art methods, achieving average F1-score improvements of 1.39% and 5.32% on in-domain and out-of-domain datasets respectively, along with accuracy gains of 1.35% and 3.61% respectively. Ablation studies confirm the critical contributions of both disentangled expert specialization and adaptive routing to model performance.


Key Contributions

  • Disentangled Mixture-of-Experts (DMoE) module separating domain-specific and shared cross-domain experts for MGT detection
  • Reinforcement learning-based routing mechanism that selects experts at inference without requiring domain labels
  • Consistent out-of-domain generalization improvements (5.32% F1, 3.61% accuracy) over SOTA MGT detectors across 10 benchmark datasets

🛡️ Threat Analysis

Output Integrity Attack

Core contribution is AI-generated content detection — detecting machine-generated text is a canonical ML09 (Output Integrity) task. The paper proposes a novel detection architecture, not a domain application of existing tools.


Details

Domains
nlp
Model Types
transformerllm
Threat Tags
inference_time
Applications
machine-generated text detectionai content attribution