tool 2025

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Chi Wang 1, Min Gao 1, Zongwei Wang 1, Junwei Yin 1, Kai Shu 2, Chenghua Lin 3

0 citations

α

Published on arXiv

2508.12632

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

LIFE achieves state-of-the-art performance on LLM-generated fake news detection while maintaining high accuracy on human-written fake news

LIFE (Linguistic Fingerprints Extraction)

Novel technique introduced


With the rapid development of large language models, the generation of fake news has become increasingly effortless, posing a growing societal threat and underscoring the urgent need for reliable detection methods. Early efforts to identify LLM-generated fake news have predominantly focused on the textual content itself; however, because much of that content may appear coherent and factually consistent, the subtle traces of falsification are often difficult to uncover. Through distributional divergence analysis, we uncover prompt-induced linguistic fingerprints: statistically distinct probability shifts between LLM-generated real and fake news when maliciously prompted. Based on this insight, we propose a novel method named Linguistic Fingerprints Extraction (LIFE). By reconstructing word-level probability distributions, LIFE can find discriminative patterns that facilitate the detection of LLM-generated fake news. To further amplify these fingerprint patterns, we also leverage key-fragment techniques that accentuate subtle linguistic differences, thereby improving detection reliability. Our experiments show that LIFE achieves state-of-the-art performance in LLM-generated fake news and maintains high performance in human-written fake news. The code and data are available at https://anonymous.4open.science/r/LIFE-E86A.


Key Contributions

  • Discovery of prompt-induced linguistic fingerprints: statistically distinct word-level probability distribution shifts between LLM-generated real and fake news under malicious prompting
  • LIFE (Linguistic Fingerprints Extraction) method that reconstructs word-level probability distributions to surface discriminative patterns for fake news detection
  • Key-fragment technique that selectively amplifies subtle linguistic differences in critical content segments, improving detection reliability

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated text detection method that forensically identifies LLM-generated fake news via probability distribution divergence and linguistic fingerprint extraction — directly targets output integrity and authenticity of LLM-generated content.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
fake news detectionllm-generated text detectionmisinformation detection