defense arXiv Sep 15, 2025 · Sep 2025
Mitchell Plyler, Yilun Zhang, Alexander Tuzhilin et al. · Mozilla Corporation · Ciphero AI +1 more
Novel Transformer detector using selected-next-token probabilities and contrastive pre-training to identify LLM-generated text out-of-domain
Output Integrity Attack nlp
LLMs are becoming increasingly capable and widespread. Consequently, the potential and reality of their misuse is also growing. In this work, we address the problem of detecting LLM-generated text that is not explicitly declared as such. We present a novel, general-purpose, and supervised LLM text detector, SElected-Next-Token tRAnsformer (SENTRA). SENTRA is a Transformer-based encoder leveraging selected-next-token-probability sequences and utilizing contrastive pre-training on large amounts of unlabeled data. Our experiments on three popular public datasets across 24 domains of text demonstrate SENTRA is a general-purpose classifier that significantly outperforms popular baselines in the out-of-domain setting.
llm transformer Mozilla Corporation · Ciphero AI · New York University