On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?

With the widespread use of large language models (LLMs), many researchers have turned their attention to detecting text generated by them. However, there is no consistent or precise definition of their target, namely "LLM-generated text". Differences in usage scenarios and the diversity of LLMs further increase the difficulty of detection. What is commonly regarded as the detecting target usually represents only a subset of the text that LLMs can potentially produce. Human edits to LLM outputs, together with the subtle influences that LLMs exert on their users, are blurring the line between LLM-generated and human-written text. Existing benchmarks and evaluation approaches do not adequately address the various conditions in real-world detector applications. Hence, the numerical results of detectors are often misunderstood, and their significance is diminishing. Therefore, detectors remain useful under specific conditions, but their results should be interpreted only as references rather than decisive indicators.

Key Contributions

Critiques the inconsistent and overly broad definitions of 'LLM-generated text' across the literature, showing that detection targets are typically only a narrow subset of what LLMs can produce
Identifies fundamental gaps in existing benchmarks and evaluation methodologies that fail to reflect real-world usage conditions (human edits, LLM-influenced writing, model diversity)
Argues that reliable LLM-generated text detection is not achievable in general practice, and existing detector results should be treated as references rather than definitive indicators

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated text detection — an explicit ML09 topic — by analyzing the state of LLM-generated text detectors, critiquing their evaluation benchmarks, and assessing their real-world detectability.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2026 0 cit.

Output Integrity Attack

80%