attack 2026

On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification

David Condrey

1 citations · 39 references · arXiv

α

Published on arXiv

2601.17280

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

All attack variants achieve ≥99.8% evasion against five keystroke-based AI authorship classifiers with mean human-confidence ≥0.993, while a formal non-identifiability result shows mutual information between timing features and content provenance is zero for copy-type attacks.

Copy-type attack / timing-forgery

Novel technique introduced


Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.


Key Contributions

  • Defines the copy-type attack and proves via mutual information analysis that it is formally non-identifiable by any classifier operating solely on keystroke timing
  • Demonstrates three timing-forgery attack variants (histogram sampling, statistical impersonation, generative LSTM) achieving ≥99.8% evasion against five classifiers on 13,000 SBU corpus sessions
  • Establishes that composition-vs-transcription motor differences (Cohen's d=1.28) are operationally unexploitable at acceptable false-rejection rates, invalidating the core assumption of keystroke-based authorship detection

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks AI-generated content provenance verification systems (keystroke-based authorship detectors) — systems designed to distinguish human-composed from AI-generated text — showing they are defeatable with ≥99.8% evasion. Defeating content integrity/provenance detection mechanisms is ML09, not ML01, per the category guidance on defeating content protection schemes.


Details

Domains
nlptabular
Model Types
traditional_mlrnn
Threat Tags
black_boxinference_time
Datasets
SBU Keystroke Corpus
Applications
ai authorship detectionkeystroke-based content provenance verificationacademic integrity platforms