defense 2025

Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection

Siyuan Li 1, Xi Lin 1, Guangyan Li 2, Zehao Liu 1, Aodu Wulianghai 1, Li Ding 1, Jun Wu 1, Jianhua Li 1

0 citations

α

Published on arXiv

2508.06913

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves over 16% and 11% F1 score improvements on Gemini-1.5-Pro and GPT-4-0613 generated text respectively compared to state-of-the-art baselines.

SentiDetect

Novel technique introduced


The rapid advancement of large language models (LLMs) has resulted in increasingly sophisticated AI-generated content, posing significant challenges in distinguishing LLM-generated text from human-written language. Existing detection methods, primarily based on lexical heuristics or fine-tuned classifiers, often suffer from limited generalizability and are vulnerable to paraphrasing, adversarial perturbations, and cross-domain shifts. In this work, we propose SentiDetect, a model-agnostic framework for detecting LLM-generated text by analyzing the divergence in sentiment distribution stability. Our method is motivated by the empirical observation that LLM outputs tend to exhibit emotionally consistent patterns, whereas human-written texts display greater emotional variability. To capture this phenomenon, we define two complementary metrics: sentiment distribution consistency and sentiment distribution preservation, which quantify stability under sentiment-altering and semantic-preserving transformations. We evaluate SentiDetect on five diverse datasets and a range of advanced LLMs,including Gemini-1.5-Pro, Claude-3, GPT-4-0613, and LLaMa-3.3. Experimental results demonstrate its superiority over state-of-the-art baselines, with over 16% and 11% F1 score improvements on Gemini-1.5-Pro and GPT-4-0613, respectively. Moreover, SentiDetect also shows greater robustness to paraphrasing, adversarial attacks, and text length variations, outperforming existing detectors in challenging scenarios.


Key Contributions

  • SentiDetect: a model-agnostic, training-free LLM-generated text detector based on sentiment distribution stability divergence
  • Two complementary unsupervised metrics — sentiment distribution consistency and sentiment distribution preservation — capturing emotional pattern stability under sentiment-altering and semantic-preserving transformations
  • Demonstrated robustness against paraphrasing, adversarial perturbations, and text length variations across five datasets and four LLM families

🛡️ Threat Analysis

Output Integrity Attack

SentiDetect is a novel AI-generated text detection framework — a direct ML09 output integrity contribution proposing new forensic metrics (sentiment distribution consistency and preservation) to distinguish LLM-generated from human-written text.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Datasets
Review datasetnews datasetcode datasetessays datasetacademic papers dataset
Applications
ai-generated text detectionauthorship attributionmisinformation detection