The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?

With the increasing reliance on LLMs as research agents, distinguishing between LLM and human-generated ideas has become crucial for understanding the cognitive nuances of LLMs' research capabilities. While detecting LLM-generated text has been extensively studied, distinguishing human vs LLM-generated scientific idea remains an unexplored area. In this work, we systematically evaluate the ability of state-of-the-art (SOTA) machine learning models to differentiate between human and LLM-generated ideas, particularly after successive paraphrasing stages. Our findings highlight the challenges SOTA models face in source attribution, with detection performance declining by an average of 25.4\% after five consecutive paraphrasing stages. Additionally, we demonstrate that incorporating the research problem as contextual information improves detection performance by up to 2.97%. Notably, our analysis reveals that detection algorithms struggle significantly when ideas are paraphrased into a simplified, non-expert style, contributing the most to the erosion of distinguishable LLM signatures.

Key Contributions

First systematic evaluation of SOTA detectors distinguishing human vs LLM-generated scientific ideas across successive paraphrasing stages
Demonstrates that iterative paraphrasing causes a 25.4% average detection performance drop, with simplified non-expert style contributing most to LLM signature erosion
Shows that incorporating research problem context improves detection by up to 2.97%, suggesting semantic grounding helps attribution

🛡️ Threat Analysis

Output Integrity Attack

The paper focuses on detecting AI-generated content (LLM vs human scientific ideas) and systematically evaluates how iterative paraphrasing erodes distinguishable LLM signatures in detectors — a direct study of output integrity and AI-generated content attribution robustness.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Datasets

846 CS conference papers from 5 top venues

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection

DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

Signature vs. Substance: Evaluating the Balance of Adversarial Resistance and Linguistic Quality in Watermarking Large Language Models

Analyzing and Evaluating Unbiased Language Model Watermark

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark