SENTRA: Selected-Next-Token Transformer for LLM Text Detection
Mitchell Plyler 1, Yilun Zhang 1, Alexander Tuzhilin 2,1, Saoud Khalifah 3, Sen Tian 2
Published on arXiv
2509.12385
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SENTRA outperforms all studied baselines in out-of-domain evaluation across 24 domains on three public benchmark datasets, demonstrating superior generalization to distribution shifts.
SENTRA (SElected-Next-Token tRAnsformer)
Novel technique introduced
LLMs are becoming increasingly capable and widespread. Consequently, the potential and reality of their misuse is also growing. In this work, we address the problem of detecting LLM-generated text that is not explicitly declared as such. We present a novel, general-purpose, and supervised LLM text detector, SElected-Next-Token tRAnsformer (SENTRA). SENTRA is a Transformer-based encoder leveraging selected-next-token-probability sequences and utilizing contrastive pre-training on large amounts of unlabeled data. Our experiments on three popular public datasets across 24 domains of text demonstrate SENTRA is a general-purpose classifier that significantly outperforms popular baselines in the out-of-domain setting.
Key Contributions
- SENTRA: a Transformer-based encoder operating on selected-next-token-probability (SNTP) sequences from frozen LLMs, enabling generalizable LLM text detection without relying on raw tokens
- Contrastive pre-training on large unlabeled data to learn domain-agnostic representations of token probability sequences, improving out-of-domain generalization
- Empirical demonstration that SENTRA significantly outperforms supervised and unsupervised baselines in out-of-domain settings across 24 text domains and three benchmark datasets
🛡️ Threat Analysis
SENTRA is a novel AI-generated content detection architecture — detecting whether text was LLM-generated falls squarely under output integrity and content provenance (ML09). The paper contributes a new detection technique (Transformer encoder on SNTP sequences with contrastive pre-training), not merely a domain application of existing detectors.