RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media
Yudong Li 1, Yufei Sun 2, Yuhan Yao 2, Peiru Yang 1, Wanyue Li 3, Jiajun Zou 1, Yongfeng Huang 1, Linlin Shen 4
Published on arXiv
2509.22055
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
PLAD achieves superior AIGT detection performance over model-based baselines while revealing that AI-generated content exhibits distinct psycholinguistic signatures and different engagement patterns compared to human-authored posts on Xiaohongshu.
PLAD (PsychoLinguistic AIGT Detection Framework)
Novel technique introduced
The proliferation of Large Language Models (LLMs) has led to widespread AI-Generated Text (AIGT) on social media platforms, creating unique challenges where content dynamics are driven by user engagement and evolve over time. However, existing datasets mainly depict static AIGT detection. In this work, we introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis. This dataset is sourced from Xiaohongshu platform, containing user engagement metrics (e.g., likes, comments) and timestamps spanning from the pre-LLM period to July 2025, which enables research into the temporal dynamics and user interaction patterns of AIGT. Furthermore, to detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach that leverages psycholinguistic features. Our experiments show that PLAD achieves superior detection performance and provides insights into the signatures distinguishing human and AI-generated content. More importantly, it reveals the complex relationship between these linguistic features and social media engagement. The dataset is available at https://github.com/testuser03158/RedNote-Vibe.
Key Contributions
- RedNote-Vibe: the first 5-year longitudinal dataset of AIGT from Xiaohongshu (RedNote) with engagement metadata (likes, comments, timestamps) spanning pre- to post-LLM era
- PLAD (PsychoLinguistic AIGT Detection Framework): an interpretable decision-tree-based detector using psycholinguistic features that outperforms baseline methods
- Empirical analysis of temporal trends in AI adoption on social media and the relationship between psycholinguistic features and user engagement patterns
🛡️ Threat Analysis
The core contribution is AI-generated text detection: both the RedNote-Vibe dataset for benchmarking AIGT detection and the PLAD framework for classifying human vs. AI-authored content fall squarely under output integrity and content provenance. PLAD's psycholinguistic feature approach is a novel detection technique, not merely applying existing methods to a new domain.