On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
Mingmeng Geng 1,2,3, Thierry Poibeau 1,2,3
Published on arXiv
2510.20810
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Concludes that LLM-generated text detection is fundamentally limited by definitional inconsistency, benchmark inadequacy, and the blurring boundary between LLM and human text, rendering detectors unreliable as decisive tools in general practice
With the widespread use of large language models (LLMs), many researchers have turned their attention to detecting text generated by them. However, there is no consistent or precise definition of their target, namely "LLM-generated text". Differences in usage scenarios and the diversity of LLMs further increase the difficulty of detection. What is commonly regarded as the detecting target usually represents only a subset of the text that LLMs can potentially produce. Human edits to LLM outputs, together with the subtle influences that LLMs exert on their users, are blurring the line between LLM-generated and human-written text. Existing benchmarks and evaluation approaches do not adequately address the various conditions in real-world detector applications. Hence, the numerical results of detectors are often misunderstood, and their significance is diminishing. Therefore, detectors remain useful under specific conditions, but their results should be interpreted only as references rather than decisive indicators.
Key Contributions
- Critiques the inconsistent and overly broad definitions of 'LLM-generated text' across the literature, showing that detection targets are typically only a narrow subset of what LLMs can produce
- Identifies fundamental gaps in existing benchmarks and evaluation methodologies that fail to reflect real-world usage conditions (human edits, LLM-influenced writing, model diversity)
- Argues that reliable LLM-generated text detection is not achievable in general practice, and existing detector results should be treated as references rather than definitive indicators
🛡️ Threat Analysis
Directly addresses AI-generated text detection — an explicit ML09 topic — by analyzing the state of LLM-generated text detectors, critiquing their evaluation benchmarks, and assessing their real-world detectability.