defense 2025

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

Mitchell Plyler 1, Yilun Zhang 1, Alexander Tuzhilin 2,1, Saoud Khalifah 3, Sen Tian 2

0 citations

α

Published on arXiv

2509.12385

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SENTRA outperforms all studied baselines in out-of-domain evaluation across 24 domains on three public benchmark datasets, demonstrating superior generalization to distribution shifts.

SENTRA (SElected-Next-Token tRAnsformer)

Novel technique introduced


LLMs are becoming increasingly capable and widespread. Consequently, the potential and reality of their misuse is also growing. In this work, we address the problem of detecting LLM-generated text that is not explicitly declared as such. We present a novel, general-purpose, and supervised LLM text detector, SElected-Next-Token tRAnsformer (SENTRA). SENTRA is a Transformer-based encoder leveraging selected-next-token-probability sequences and utilizing contrastive pre-training on large amounts of unlabeled data. Our experiments on three popular public datasets across 24 domains of text demonstrate SENTRA is a general-purpose classifier that significantly outperforms popular baselines in the out-of-domain setting.


Key Contributions

  • SENTRA: a Transformer-based encoder operating on selected-next-token-probability (SNTP) sequences from frozen LLMs, enabling generalizable LLM text detection without relying on raw tokens
  • Contrastive pre-training on large unlabeled data to learn domain-agnostic representations of token probability sequences, improving out-of-domain generalization
  • Empirical demonstration that SENTRA significantly outperforms supervised and unsupervised baselines in out-of-domain settings across 24 text domains and three benchmark datasets

🛡️ Threat Analysis

Output Integrity Attack

SENTRA is a novel AI-generated content detection architecture — detecting whether text was LLM-generated falls squarely under output integrity and content provenance (ML09). The paper contributes a new detection technique (Transformer encoder on SNTP sequences with contrastive pre-training), not merely a domain application of existing detectors.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Datasets
RAID
Applications
llm-generated text detectionai content authentication