attack 2026

On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification

David Condrey

Writerslogic Inc.

1 citations · 39 references · arXiv

Published on arXiv

2601.17280

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

All attack variants achieve ≥99.8% evasion against five keystroke-based AI authorship classifiers with mean human-confidence ≥0.993, while a formal non-identifiability result shows mutual information between timing features and content provenance is zero for copy-type attacks.

Copy-type attack / timing-forgery

Novel technique introduced

Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.

Key Contributions

Defines the copy-type attack and proves via mutual information analysis that it is formally non-identifiable by any classifier operating solely on keystroke timing
Demonstrates three timing-forgery attack variants (histogram sampling, statistical impersonation, generative LSTM) achieving ≥99.8% evasion against five classifiers on 13,000 SBU corpus sessions
Establishes that composition-vs-transcription motor differences (Cohen's d=1.28) are operationally unexploitable at acceptable false-rejection rates, invalidating the core assumption of keystroke-based authorship detection

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks AI-generated content provenance verification systems (keystroke-based authorship detectors) — systems designed to distinguish human-composed from AI-generated text — showing they are defeatable with ≥99.8% evasion. Defeating content integrity/provenance detection mechanisms is ML09, not ML01, per the category guidance on defeating content protection schemes.

Details

Domains

nlptabular

Model Types

traditional_mlrnn

Threat Tags

black_boxinference_time

Datasets

SBU Keystroke Corpus

Applications

ai authorship detectionkeystroke-based content provenance verificationacademic integrity platforms

Read PDF arXiv DOI

On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

LLM Watermark Evasion via Bias Inversion

Cross-Lingual Summarization as a Black-Box Watermark Removal Attack

The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Character-Level Perturbations Disrupt LLM Watermarks

SHLIME: Foiling adversarial attacks fooling SHAP and LIME