defense 2025

DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning

Yongxin He ^1,2,3, Shan Zhang ^4,3, Yixuan Cao ^1,2,3, Lei Ma ^4,3, Ping Luo ^1,2,3

¹ Institute of Computing Technology, Chinese Academy of Sciences

² State Key Lab of AI Safety

³ University of Chinese Academy of Sciences

⁴ Institute of Automation, Chinese Academy of Sciences

1 citations · 86 references · arXiv

Published on arXiv

2510.17489

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DETree improves hybrid text detection performance and significantly enhances OOD robustness, particularly in few-shot learning conditions, over binary and multi-class baselines

DETree (Hierarchical Affinity Tree)

Novel technique introduced

Detecting AI-involved text is essential for combating misinformation, plagiarism, and academic misconduct. However, AI text generation includes diverse collaborative processes (AI-written text edited by humans, human-written text edited by AI, and AI-generated text refined by other AI), where various or even new LLMs could be involved. Texts generated through these varied processes exhibit complex characteristics, presenting significant challenges for detection. Current methods model these processes rather crudely, primarily employing binary classification (purely human vs. AI-involved) or multi-classification (treating human-AI collaboration as a new class). We observe that representations of texts generated through different processes exhibit inherent clustering relationships. Therefore, we propose DETree, a novel approach that models the relationships among different processes as a Hierarchical Affinity Tree structure, and introduces a specialized loss function that aligns text representations with this tree. To facilitate this learning, we developed RealBench, a comprehensive benchmark dataset that automatically incorporates a wide spectrum of hybrid texts produced through various human-AI collaboration processes. Our method improves performance in hybrid text detection tasks and significantly enhances robustness and generalization in out-of-distribution scenarios, particularly in few-shot learning conditions, further demonstrating the promise of training-based approaches in OOD settings. Our code and dataset are available at https://github.com/heyongxin233/DETree.

Key Contributions

DETree: a hierarchical affinity tree structure that models relationships among different human-AI collaborative text generation processes, with a specialized alignment loss function
RealBench: a comprehensive benchmark dataset automatically covering a wide spectrum of hybrid texts from AI-written+human-edited, human-written+AI-edited, and AI-refined-by-AI processes
Demonstrated improved robustness and generalization in out-of-distribution settings, including few-shot learning scenarios

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated content detection architecture (DETree with Hierarchical Affinity Tree structure) specifically targeting detection of LLM-involved text across diverse human-AI collaborative generation processes — a direct ML09 contribution to output integrity and content authenticity.

Details

Domains

nlp

Model Types

transformerllm

Threat Tags

inference_time

Datasets

RealBench

Applications

ai-generated text detectionacademic integritymisinformation detectionplagiarism detection

Read PDF arXiv DOI Code

DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text

ArcMark: Multi-bit LLM Watermark via Optimal Transport

WISER: Segmenting watermarked region - an epidemic change-point perspective

Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration

More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles

MGT-Prism: Enhancing Domain Generalization for Machine-Generated Text Detection via Spectral Alignment

MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection