On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification
Published on arXiv
2601.17280
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
All attack variants achieve ≥99.8% evasion against five keystroke-based AI authorship classifiers with mean human-confidence ≥0.993, while a formal non-identifiability result shows mutual information between timing features and content provenance is zero for copy-type attacks.
Copy-type attack / timing-forgery
Novel technique introduced
Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.
Key Contributions
- Defines the copy-type attack and proves via mutual information analysis that it is formally non-identifiable by any classifier operating solely on keystroke timing
- Demonstrates three timing-forgery attack variants (histogram sampling, statistical impersonation, generative LSTM) achieving ≥99.8% evasion against five classifiers on 13,000 SBU corpus sessions
- Establishes that composition-vs-transcription motor differences (Cohen's d=1.28) are operationally unexploitable at acceptable false-rejection rates, invalidating the core assumption of keystroke-based authorship detection
🛡️ Threat Analysis
The paper attacks AI-generated content provenance verification systems (keystroke-based authorship detectors) — systems designed to distinguish human-composed from AI-generated text — showing they are defeatable with ≥99.8% evasion. Defeating content integrity/provenance detection mechanisms is ML09, not ML01, per the category guidance on defeating content protection schemes.