α

Published on arXiv

2511.11340

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Four teams submitted final results across both subtasks, with methods spanning statistical baselines to fine-tuned transformer classifiers; the benchmark highlights vulnerability of detectors to paraphrasing and prompt variation.

M-DAIGT

Novel technique introduced


The generation of highly fluent text by Large Language Models (LLMs) poses a significant challenge to information integrity and academic research. In this paper, we introduce the Multi-Domain Detection of AI-Generated Text (M-DAIGT) shared task, which focuses on detecting AI-generated text across multiple domains, particularly in news articles and academic writing. M-DAIGT comprises two binary classification subtasks: News Article Detection (NAD) (Subtask 1) and Academic Writing Detection (AWD) (Subtask 2). To support this task, we developed and released a new large-scale benchmark dataset of 30,000 samples, balanced between human-written and AI-generated texts. The AI-generated content was produced using a variety of modern LLMs (e.g., GPT-4, Claude) and diverse prompting strategies. A total of 46 unique teams registered for the shared task, of which four teams submitted final results. All four teams participated in both Subtask 1 and Subtask 2. We describe the methods employed by these participating teams and briefly discuss future directions for M-DAIGT.


Key Contributions

  • Large-scale 30,000-sample benchmark dataset balanced between human-written and AI-generated text across news and academic domains, using GPT-4 and Claude with diverse prompting strategies
  • Structured shared task (M-DAIGT) with two binary classification subtasks: News Article Detection (NAD) and Academic Writing Detection (AWD)
  • Comparative analysis of participating systems ranging from statistical methods to transformer-based detectors, documenting the current state of the art and remaining challenges

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated text detection — a core output integrity problem — by creating evaluation infrastructure (dataset + shared task) for distinguishing human-written from LLM-generated content in news articles and academic writing.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Datasets
M-DAIGT dataset (30K samples, human vs. AI-generated)
Applications
ai-generated text detectionnews article authenticity verificationacademic integrity monitoring