tool 2025

Fine-Grained Detection of AI-Generated Text Using Sentence-Level Segmentation

Lekkala Sai Teja , Annepaka Yadagiri , Partha Pakray , Chukhu Chunka , Mangadoddi Srikar Vardhan

1 citations · 31 references · IJCNLP-AACL

α

Published on arXiv

2509.17830

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

The Transformer-NN-CRF model with boundary-aware loss outperforms zero-shot detectors and prior SOTA models at detecting AI-generated text spans at sentence/token granularity in collaborative human-AI documents.

Transformer-NN-CRF

Novel technique introduced


Generation of Artificial Intelligence (AI) texts in important works has become a common practice that can be used to misuse and abuse AI at various levels. Traditional AI detectors often rely on document-level classification, which struggles to identify AI content in hybrid or slightly edited texts designed to avoid detection, leading to concerns about the model's efficiency, which makes it hard to distinguish between human-written and AI-generated texts. A sentence-level sequence labeling model proposed to detect transitions between human- and AI-generated text, leveraging nuanced linguistic signals overlooked by document-level classifiers. By this method, detecting and segmenting AI and human-written text within a single document at the token-level granularity is achieved. Our model combines the state-of-the-art pre-trained Transformer models, incorporating Neural Networks (NN) and Conditional Random Fields (CRFs). This approach extends the power of transformers to extract semantic and syntactic patterns, and the neural network component to capture enhanced sequence-level representations, thereby improving the boundary predictions by the CRF layer, which enhances sequence recognition and further identification of the partition between Human- and AI-generated texts. The evaluation is performed on two publicly available benchmark datasets containing collaborative human and AI-generated texts. Our experimental comparisons are with zero-shot detectors and the existing state-of-the-art models, along with rigorous ablation studies to justify that this approach, in particular, can accurately detect the spans of AI texts in a completely collaborative text. All our source code and the processed datasets are available in our GitHub repository.


Key Contributions

  • Novel Transformer-NN-CRF architecture with dynamic dropout and a custom boundary-aware hierarchical loss, adapted for mixed-authorship segmentation rather than standard NER/POS tasks
  • Sentence- and token-level sequence labeling that identifies exact AI-generated spans inside hybrid human-AI documents, outperforming document-level classifiers on partially-edited texts
  • Rigorous comparison against zero-shot detectors and SOTA models on two public collaborative human-AI benchmark datasets, with ablation studies validating each architectural component

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel forensic detection architecture for AI-generated content at token/sentence granularity — directly addresses output integrity and AI content provenance by identifying which spans of a document were machine-generated.


Details

Domains
nlp
Model Types
transformertraditional_ml
Threat Tags
inference_time
Datasets
SemEval 2024 M4GTRoFT
Applications
ai-generated text detectionacademic integrityauthorship segmentation