SENTRA: Selected-Next-Token Transformer for LLM Text Detection

LLMs are becoming increasingly capable and widespread. Consequently, the potential and reality of their misuse is also growing. In this work, we address the problem of detecting LLM-generated text that is not explicitly declared as such. We present a novel, general-purpose, and supervised LLM text detector, SElected-Next-Token tRAnsformer (SENTRA). SENTRA is a Transformer-based encoder leveraging selected-next-token-probability sequences and utilizing contrastive pre-training on large amounts of unlabeled data. Our experiments on three popular public datasets across 24 domains of text demonstrate SENTRA is a general-purpose classifier that significantly outperforms popular baselines in the out-of-domain setting.

Key Contributions

SENTRA: a Transformer-based encoder operating on selected-next-token-probability (SNTP) sequences from frozen LLMs, enabling generalizable LLM text detection without relying on raw tokens
Contrastive pre-training on large unlabeled data to learn domain-agnostic representations of token probability sequences, improving out-of-domain generalization
Empirical demonstration that SENTRA significantly outperforms supervised and unsupervised baselines in out-of-domain settings across 24 text domains and three benchmark datasets

🛡️ Threat Analysis

Output Integrity Attack

SENTRA is a novel AI-generated content detection architecture — detecting whether text was LLM-generated falls squarely under output integrity and content provenance (ML09). The paper contributes a new detection technique (Transformer encoder on SNTP sequences with contrastive pre-training), not merely a domain application of existing detectors.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Datasets

RAID

Applications

2026 0 cit.

Output Integrity Attack

100%