defense 2025

CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection

Yue Wang 1, Liesheng Wei 2, Yuxiang Wang 3

0 citations

α

Published on arXiv

2508.11933

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

CAMF significantly outperforms state-of-the-art zero-shot MGT detection techniques by combining multi-dimensional linguistic feature extraction with adversarial consistency probing across style, semantics, and logic dimensions.

CAMF (Collaborative Adversarial Multi-agent Framework)

Novel technique introduced


Detecting machine-generated text (MGT) from contemporary Large Language Models (LLMs) is increasingly crucial amid risks like disinformation and threats to academic integrity. Existing zero-shot detection paradigms, despite their practicality, often exhibit significant deficiencies. Key challenges include: (1) superficial analyses focused on limited textual attributes, and (2) a lack of investigation into consistency across linguistic dimensions such as style, semantics, and logic. To address these challenges, we introduce the \textbf{C}ollaborative \textbf{A}dversarial \textbf{M}ulti-agent \textbf{F}ramework (\textbf{CAMF}), a novel architecture using multiple LLM-based agents. CAMF employs specialized agents in a synergistic three-phase process: \emph{Multi-dimensional Linguistic Feature Extraction}, \emph{Adversarial Consistency Probing}, and \emph{Synthesized Judgment Aggregation}. This structured collaborative-adversarial process enables a deep analysis of subtle, cross-dimensional textual incongruities indicative of non-human origin. Empirical evaluations demonstrate CAMF's significant superiority over state-of-the-art zero-shot MGT detection techniques.


Key Contributions

  • CAMF: a collaborative-adversarial multi-agent framework for zero-shot machine-generated text (MGT) detection using specialized LLM-based agents
  • Three-phase detection pipeline: Multi-dimensional Linguistic Feature Extraction, Adversarial Consistency Probing, and Synthesized Judgment Aggregation
  • Empirical demonstration of superiority over state-of-the-art zero-shot MGT detectors with ablation studies validating each component

🛡️ Threat Analysis

Output Integrity Attack

CAMF is a novel architecture for AI-generated text detection — it falls squarely in ML09 as an output integrity / content provenance tool. The paper proposes a new detection framework (not merely applying existing methods to a domain), using multi-agent adversarial probing to distinguish human-authored from LLM-generated text.


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Applications
machine-generated text detectionacademic integrity verificationdisinformation detection