DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

The rapid advancement of large language models (LLMs) has blurred the line between AI-generated and human-written text. This progress brings societal risks such as misinformation, authorship ambiguity, and intellectual property concerns, highlighting the urgent need for reliable AI-generated text detection methods. However, recent advances in generative language modeling have resulted in significant overlap between the feature distributions of human-written and AI-generated text, blurring classification boundaries and making accurate detection increasingly challenging. To address the above challenges, we propose a DNA-inspired perspective, leveraging a repair-based process to directly and interpretably capture the intrinsic differences between human-written and AI-generated text. Building on this perspective, we introduce DNA-DetectLLM, a zero-shot detection method for distinguishing AI-generated and human-written text. The method constructs an ideal AI-generated sequence for each input, iteratively repairs non-optimal tokens, and quantifies the cumulative repair effort as an interpretable detection signal. Empirical evaluations demonstrate that our method achieves state-of-the-art detection performance and exhibits strong robustness against various adversarial attacks and input lengths. Specifically, DNA-DetectLLM achieves relative improvements of 5.55% in AUROC and 2.08% in F1 score across multiple public benchmark datasets. Code and data are available at https://github.com/Xiaoweizhu57/DNA-DetectLLM.

Key Contributions

Introduces the mutation-repair paradigm for AI-generated text detection, analogizing LLM output to a DNA 'template strand' and human text to a 'mutated strand' with measurable deviations
Proposes DNA-DetectLLM, a zero-shot detector that constructs an ideal AI sequence via greedy decoding per input and quantifies cumulative token-repair effort as an interpretable detection signal
Achieves 5.55% relative AUROC and 2.08% F1 improvements over prior methods across public benchmarks, with demonstrated robustness against adversarial attacks and variable input lengths

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection — proposes a novel detection method (not a domain application of existing methods) that verifies whether text was produced by an LLM, fitting the output integrity and content authenticity scope of ML09.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

RAIDmultiple public benchmark datasets

Applications

2026 0 cit.

Output Integrity Attack

100%