Mixture of Detectors: A Compact View of Machine-Generated Text Detection

Large Language Models (LLMs) are gearing up to surpass human creativity. The veracity of the statement needs careful consideration. In recent developments, critical questions arise regarding the authenticity of human work and the preservation of their creativity and innovative abilities. This paper investigates such issues. This paper addresses machine-generated text detection across several scenarios, including document-level binary and multiclass classification or generator attribution, sentence-level segmentation to differentiate between human-AI collaborative text, and adversarial attacks aimed at reducing the detectability of machine-generated text. We introduce a new work called BMAS English: an English language dataset for binary classification of human and machine text, for multiclass classification, which not only identifies machine-generated text but can also try to determine its generator, and Adversarial attack addressing where it is a common act for the mitigation of detection, and Sentence-level segmentation, for predicting the boundaries between human and machine-generated text. We believe that this paper will address previous work in Machine-Generated Text Detection (MGTD) in a more meaningful way.

Key Contributions

BMAS English dataset covering four MGT detection tasks: binary classification, multiclass/generator attribution, adversarial attack robustness (five attack types), and sentence-level mixed-text boundary detection
Comprehensive comparative evaluation of multiple supervised detector models across all four scenarios within a unified framework
Mixed-text boundary detection task with precise word-index boundary annotations for three human-AI interleaving patterns

🛡️ Threat Analysis

Output Integrity Attack

Core focus is AI-generated text detection (binary and multiclass/generator attribution), robustness of detectors against evasion techniques (synonym substitution, homoglyph replacement, paraphrasing), and mixed human-AI text boundary detection — all directly addressing output integrity and content provenance authentication.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

BMAS English

Applications

2025 1 cit.

Output Integrity Attack

100%