defense 2025

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

Xiaowei Zhu 1,2, Yubing Ren 1,2, Fang Fang 1,2, Qingfeng Tan 3,4, Shi Wang 1, Yanan Cao 1,2

0 citations

α

Published on arXiv

2509.15550

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 5.55% relative AUROC and 2.08% F1 improvement over prior zero-shot methods across multiple benchmarks, processing each sample in under 0.8 seconds

DNA-DetectLLM

Novel technique introduced


The rapid advancement of large language models (LLMs) has blurred the line between AI-generated and human-written text. This progress brings societal risks such as misinformation, authorship ambiguity, and intellectual property concerns, highlighting the urgent need for reliable AI-generated text detection methods. However, recent advances in generative language modeling have resulted in significant overlap between the feature distributions of human-written and AI-generated text, blurring classification boundaries and making accurate detection increasingly challenging. To address the above challenges, we propose a DNA-inspired perspective, leveraging a repair-based process to directly and interpretably capture the intrinsic differences between human-written and AI-generated text. Building on this perspective, we introduce DNA-DetectLLM, a zero-shot detection method for distinguishing AI-generated and human-written text. The method constructs an ideal AI-generated sequence for each input, iteratively repairs non-optimal tokens, and quantifies the cumulative repair effort as an interpretable detection signal. Empirical evaluations demonstrate that our method achieves state-of-the-art detection performance and exhibits strong robustness against various adversarial attacks and input lengths. Specifically, DNA-DetectLLM achieves relative improvements of 5.55% in AUROC and 2.08% in F1 score across multiple public benchmark datasets. Code and data are available at https://github.com/Xiaoweizhu57/DNA-DetectLLM.


Key Contributions

  • Introduces the mutation-repair paradigm for AI-generated text detection, analogizing LLM output to a DNA 'template strand' and human text to a 'mutated strand' with measurable deviations
  • Proposes DNA-DetectLLM, a zero-shot detector that constructs an ideal AI sequence via greedy decoding per input and quantifies cumulative token-repair effort as an interpretable detection signal
  • Achieves 5.55% relative AUROC and 2.08% F1 improvements over prior methods across public benchmarks, with demonstrated robustness against adversarial attacks and variable input lengths

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection — proposes a novel detection method (not a domain application of existing methods) that verifies whether text was produced by an LLM, fitting the output integrity and content authenticity scope of ML09.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Datasets
RAIDmultiple public benchmark datasets
Applications
ai-generated text detectionauthorship attributionmisinformation detection