Complete Evasion, Zero Modification: PDF Attacks on AI Text Detection
Published on arXiv
2508.01887
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
PDFuzz reduces ArguGPT detector accuracy from 93.6% to random-level 50.4% (F1: 0.938→0.0) while preserving perfect visual fidelity through PDF character-position scrambling.
PDFuzz
Novel technique introduced
AI-generated text detectors have become essential tools for maintaining content authenticity, yet their robustness against evasion attacks remains questionable. We present PDFuzz, a novel attack that exploits the discrepancy between visual text layout and extraction order in PDF documents. Our method preserves exact textual content while manipulating character positioning to scramble extraction sequences. We evaluate this approach against the ArguGPT detector using a dataset of human and AI-generated text. Our results demonstrate complete evasion: detector performance drops from (93.6 $\pm$ 1.4) % accuracy and 0.938 $\pm$ 0.014 F1 score to random-level performance ((50.4 $\pm$ 3.2) % accuracy, 0.0 F1 score) while maintaining perfect visual fidelity. Our work reveals a vulnerability in current detection systems that is inherent to PDF document structures and underscores the need for implementing sturdy safeguards against such attacks. We make our code publicly available at https://github.com/ACMCMC/PDFuzz.
Key Contributions
- PDFuzz: first PDF-based text ordering attack exploiting the gap between visual character layout and extraction order in PDF documents
- Complete evasion of ArguGPT detector (93.6%→50.4% accuracy, F1 0.938→0.0) with zero content, visual, or semantic modification
- Empirical demonstration that current AI text detectors are fundamentally vulnerable to document-format-level manipulation rather than only content-level attacks
🛡️ Threat Analysis
PDFuzz attacks an AI-generated text detection system (ArguGPT) by manipulating PDF document structure to scramble text extraction order, defeating content authenticity verification while leaving visual appearance and semantics intact — a direct attack on output integrity and AI-generated content detection infrastructure.