Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection
Chi Wang 1, Min Gao 1, Zongwei Wang 1, Junwei Yin 1, Kai Shu 2, Chenghua Lin 3
Published on arXiv
2508.12632
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
LIFE achieves state-of-the-art performance on LLM-generated fake news detection while maintaining high accuracy on human-written fake news
LIFE (Linguistic Fingerprints Extraction)
Novel technique introduced
With the rapid development of large language models, the generation of fake news has become increasingly effortless, posing a growing societal threat and underscoring the urgent need for reliable detection methods. Early efforts to identify LLM-generated fake news have predominantly focused on the textual content itself; however, because much of that content may appear coherent and factually consistent, the subtle traces of falsification are often difficult to uncover. Through distributional divergence analysis, we uncover prompt-induced linguistic fingerprints: statistically distinct probability shifts between LLM-generated real and fake news when maliciously prompted. Based on this insight, we propose a novel method named Linguistic Fingerprints Extraction (LIFE). By reconstructing word-level probability distributions, LIFE can find discriminative patterns that facilitate the detection of LLM-generated fake news. To further amplify these fingerprint patterns, we also leverage key-fragment techniques that accentuate subtle linguistic differences, thereby improving detection reliability. Our experiments show that LIFE achieves state-of-the-art performance in LLM-generated fake news and maintains high performance in human-written fake news. The code and data are available at https://anonymous.4open.science/r/LIFE-E86A.
Key Contributions
- Discovery of prompt-induced linguistic fingerprints: statistically distinct word-level probability distribution shifts between LLM-generated real and fake news under malicious prompting
- LIFE (Linguistic Fingerprints Extraction) method that reconstructs word-level probability distributions to surface discriminative patterns for fake news detection
- Key-fragment technique that selectively amplifies subtle linguistic differences in critical content segments, improving detection reliability
🛡️ Threat Analysis
Proposes a novel AI-generated text detection method that forensically identifies LLM-generated fake news via probability distribution divergence and linguistic fingerprint extraction — directly targets output integrity and authenticity of LLM-generated content.