defense 2026

Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

Hongyi Zhou 1, Jin Zhu 2, Kai Ye 3, Ying Yang 1, Erhan Xu 3, Chengchun Shi 3

2 citations · 75 references · arXiv

α

Published on arXiv

2601.21895

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 57.8% to 80.6% relative improvement over the strongest baseline across different target LLMs (GPT, Claude, Gemini) in over 100 experimental settings

Learn-to-Distance

Novel technique introduced


Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Yet, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLM-generated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 57.8\% to 80.6\% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini).


Key Contributions

  • Geometric framework demystifying rewrite-based detection algorithms and explaining their generalization ability
  • Novel rewrite-based detection algorithm ('Learn-to-Distance') that adaptively learns the distance between original and rewritten text rather than using a fixed distance function
  • Theoretical proof that an adaptively learned distance function is more effective for detection, supported by extensive empirical evaluation across 100+ settings

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel passive detection algorithm for identifying LLM-generated text — a core output integrity/content provenance problem. The paper introduces a new geometric framework and learned distance function for rewrite-based detection, directly contributing to AI-generated content detection methodology.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
ai-generated text detectionacademic integritymisinformation detection