tool 2026

Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?

Nasser A Alsadhan

0 citations

α

Published on arXiv

2603.23219

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

XGBoost models with 8 stylometric features achieve accuracy comparable to high-dimensional BERT classifiers in detecting AI-generated mimicry


Amidst the rising capabilities of generative AI to mimic specific human styles, this study investigates the ability of state-of-the-art large language models (LLMs), including GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5, to emulate the authorial signatures of prominent literary and political figures: Walt Whitman, William Wordsworth, Donald Trump, and Barack Obama. Utilizing a zero-shot prompting framework with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost). Our methodology integrates Linguistic Inquiry and Word Count (LIWC) markers, perplexity, and readability indices to assess the divergence between AI-generated and human-authored text. Results demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features achieving accuracy comparable to high-dimensional neural classifiers. Feature importance analyses identify perplexity as the primary discriminative metric, revealing a significant divergence in the stochastic regularity of AI outputs compared to the higher variability of human writing. While LLMs exhibit distributional convergence with human authors on low-dimensional heuristic features, such as syntactic complexity and readability, they do not yet fully replicate the nuanced affective density and stylistic variance inherent in the human-authored corpus. By isolating the specific statistical gaps in current generative mimicry, this study provides a comprehensive benchmark for LLM stylistic behavior and offers critical insights for authorship attribution in the digital humanities and social media.


Key Contributions

  • Complementary detection framework combining BERT and XGBoost for AI authorship attribution
  • Identification of perplexity as the primary discriminative feature for detecting AI-generated mimicry
  • Benchmark showing LLMs remain highly detectable despite distributional convergence on low-dimensional features

🛡️ Threat Analysis

Output Integrity Attack

Core contribution is detecting AI-generated text (LLM outputs) and verifying authorship authenticity using ML classifiers—this is AI-generated content detection, a primary ML09 use case.


Details

Domains
nlp
Model Types
llmtransformertraditional_ml
Threat Tags
inference_time
Datasets
Custom corpus of Whitman, Wordsworth, Trump, Obama texts with GPT-4o/Gemini/Claude-generated mimicry
Applications
authorship attributionai-generated text detectiondigital humanities