benchmark 2025

CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection

Yihan Chen 1,2, Jiawei Chen 1,2, Guozhao Mo 1,2, Xuanang Chen 1, Ben He 1,2, Xianpei Han 1, Le Sun 1

0 citations

α

Published on arXiv

2509.04460

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

CoCoDet outperforms existing style-reliant AI-text detectors on peer reviews, particularly maintaining robustness when reviews are paraphrased to evade detection.

CoCoDet

Novel technique introduced


The growing integration of large language models (LLMs) into the peer review process presents potential risks to the fairness and reliability of scholarly evaluation. While LLMs offer valuable assistance for reviewers with language refinement, there is growing concern over their use to generate substantive review content. Existing general AI-generated text detectors are vulnerable to paraphrasing attacks and struggle to distinguish between surface language refinement and substantial content generation, suggesting that they primarily rely on stylistic cues. When applied to peer review, this limitation can result in unfairly suspecting reviews with permissible AI-assisted language enhancement, while failing to catch deceptively humanized AI-generated reviews. To address this, we propose a paradigm shift from style-based to content-based detection. Specifically, we introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews, covering six distinct modes of human-AI collaboration. Furthermore, we develop CoCoDet, an AI review detector via a multi-task learning framework, designed to achieve more accurate and robust detection of AI involvement in review content. Our work offers a practical foundation for evaluating the use of LLMs in peer review, and contributes to the development of more precise, equitable, and reliable detection methods for real-world scholarly applications. Our code and data will be publicly available at https://github.com/Y1hanChen/COCONUTS.


Key Contributions

  • CoCoNUTS: a fine-grained benchmark dataset covering six distinct modes of human-AI collaboration in peer review for evaluating AI-generated text detection
  • CoCoDet: a multi-task learning detector that focuses on review content rather than stylistic cues, improving robustness against paraphrasing attacks
  • Empirical demonstration that existing general AI-generated text detectors rely on stylistic features and fail to distinguish permissible language refinement from substantive AI content generation

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated text detection system (CoCoDet) and evaluation benchmark (CoCoNUTS) targeting LLM-generated peer review content. The paper explicitly addresses paraphrasing attacks that defeat style-reliant detectors and introduces a content-focused multi-task learning framework to improve robustness — this is output integrity/AI-generated content detection, not a domain-only application.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
CoCoNUTS (proposed)
Applications
academic peer reviewai-generated text detection