DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm
Xiaowei Zhu 1,2, Yubing Ren 1,2, Fang Fang 1,2, Qingfeng Tan 3,4, Shi Wang 1, Yanan Cao 1,2
Published on arXiv
2509.15550
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves 5.55% relative AUROC and 2.08% F1 improvement over prior zero-shot methods across multiple benchmarks, processing each sample in under 0.8 seconds
DNA-DetectLLM
Novel technique introduced
The rapid advancement of large language models (LLMs) has blurred the line between AI-generated and human-written text. This progress brings societal risks such as misinformation, authorship ambiguity, and intellectual property concerns, highlighting the urgent need for reliable AI-generated text detection methods. However, recent advances in generative language modeling have resulted in significant overlap between the feature distributions of human-written and AI-generated text, blurring classification boundaries and making accurate detection increasingly challenging. To address the above challenges, we propose a DNA-inspired perspective, leveraging a repair-based process to directly and interpretably capture the intrinsic differences between human-written and AI-generated text. Building on this perspective, we introduce DNA-DetectLLM, a zero-shot detection method for distinguishing AI-generated and human-written text. The method constructs an ideal AI-generated sequence for each input, iteratively repairs non-optimal tokens, and quantifies the cumulative repair effort as an interpretable detection signal. Empirical evaluations demonstrate that our method achieves state-of-the-art detection performance and exhibits strong robustness against various adversarial attacks and input lengths. Specifically, DNA-DetectLLM achieves relative improvements of 5.55% in AUROC and 2.08% in F1 score across multiple public benchmark datasets. Code and data are available at https://github.com/Xiaoweizhu57/DNA-DetectLLM.
Key Contributions
- Introduces the mutation-repair paradigm for AI-generated text detection, analogizing LLM output to a DNA 'template strand' and human text to a 'mutated strand' with measurable deviations
- Proposes DNA-DetectLLM, a zero-shot detector that constructs an ideal AI sequence via greedy decoding per input and quantifies cumulative token-repair effort as an interpretable detection signal
- Achieves 5.55% relative AUROC and 2.08% F1 improvements over prior methods across public benchmarks, with demonstrated robustness against adversarial attacks and variable input lengths
🛡️ Threat Analysis
Directly addresses AI-generated content detection — proposes a novel detection method (not a domain application of existing methods) that verifies whether text was produced by an LLM, fitting the output integrity and content authenticity scope of ML09.