CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection
Yue Wang 1, Liesheng Wei 2, Yuxiang Wang 3
Published on arXiv
2508.11933
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
CAMF significantly outperforms state-of-the-art zero-shot MGT detection techniques by combining multi-dimensional linguistic feature extraction with adversarial consistency probing across style, semantics, and logic dimensions.
CAMF (Collaborative Adversarial Multi-agent Framework)
Novel technique introduced
Detecting machine-generated text (MGT) from contemporary Large Language Models (LLMs) is increasingly crucial amid risks like disinformation and threats to academic integrity. Existing zero-shot detection paradigms, despite their practicality, often exhibit significant deficiencies. Key challenges include: (1) superficial analyses focused on limited textual attributes, and (2) a lack of investigation into consistency across linguistic dimensions such as style, semantics, and logic. To address these challenges, we introduce the \textbf{C}ollaborative \textbf{A}dversarial \textbf{M}ulti-agent \textbf{F}ramework (\textbf{CAMF}), a novel architecture using multiple LLM-based agents. CAMF employs specialized agents in a synergistic three-phase process: \emph{Multi-dimensional Linguistic Feature Extraction}, \emph{Adversarial Consistency Probing}, and \emph{Synthesized Judgment Aggregation}. This structured collaborative-adversarial process enables a deep analysis of subtle, cross-dimensional textual incongruities indicative of non-human origin. Empirical evaluations demonstrate CAMF's significant superiority over state-of-the-art zero-shot MGT detection techniques.
Key Contributions
- CAMF: a collaborative-adversarial multi-agent framework for zero-shot machine-generated text (MGT) detection using specialized LLM-based agents
- Three-phase detection pipeline: Multi-dimensional Linguistic Feature Extraction, Adversarial Consistency Probing, and Synthesized Judgment Aggregation
- Empirical demonstration of superiority over state-of-the-art zero-shot MGT detectors with ablation studies validating each component
🛡️ Threat Analysis
CAMF is a novel architecture for AI-generated text detection — it falls squarely in ML09 as an output integrity / content provenance tool. The paper proposes a new detection framework (not merely applying existing methods to a domain), using multi-agent adversarial probing to distinguish human-authored from LLM-generated text.