CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection

Detecting machine-generated text (MGT) from contemporary Large Language Models (LLMs) is increasingly crucial amid risks like disinformation and threats to academic integrity. Existing zero-shot detection paradigms, despite their practicality, often exhibit significant deficiencies. Key challenges include: (1) superficial analyses focused on limited textual attributes, and (2) a lack of investigation into consistency across linguistic dimensions such as style, semantics, and logic. To address these challenges, we introduce the \textbf{C}ollaborative \textbf{A}dversarial \textbf{M}ulti-agent \textbf{F}ramework (\textbf{CAMF}), a novel architecture using multiple LLM-based agents. CAMF employs specialized agents in a synergistic three-phase process: \emph{Multi-dimensional Linguistic Feature Extraction}, \emph{Adversarial Consistency Probing}, and \emph{Synthesized Judgment Aggregation}. This structured collaborative-adversarial process enables a deep analysis of subtle, cross-dimensional textual incongruities indicative of non-human origin. Empirical evaluations demonstrate CAMF's significant superiority over state-of-the-art zero-shot MGT detection techniques.

Key Contributions

CAMF: a collaborative-adversarial multi-agent framework for zero-shot machine-generated text (MGT) detection using specialized LLM-based agents
Three-phase detection pipeline: Multi-dimensional Linguistic Feature Extraction, Adversarial Consistency Probing, and Synthesized Judgment Aggregation
Empirical demonstration of superiority over state-of-the-art zero-shot MGT detectors with ablation studies validating each component

🛡️ Threat Analysis

Output Integrity Attack

CAMF is a novel architecture for AI-generated text detection — it falls squarely in ML09 as an output integrity / content provenance tool. The paper proposes a new detection framework (not merely applying existing methods to a domain), using multi-agent adversarial probing to distinguish human-authored from LLM-generated text.