Complete Evasion, Zero Modification: PDF Attacks on AI Text Detection

AI-generated text detectors have become essential tools for maintaining content authenticity, yet their robustness against evasion attacks remains questionable. We present PDFuzz, a novel attack that exploits the discrepancy between visual text layout and extraction order in PDF documents. Our method preserves exact textual content while manipulating character positioning to scramble extraction sequences. We evaluate this approach against the ArguGPT detector using a dataset of human and AI-generated text. Our results demonstrate complete evasion: detector performance drops from (93.6 $\pm$ 1.4) % accuracy and 0.938 $\pm$ 0.014 F1 score to random-level performance ((50.4 $\pm$ 3.2) % accuracy, 0.0 F1 score) while maintaining perfect visual fidelity. Our work reveals a vulnerability in current detection systems that is inherent to PDF document structures and underscores the need for implementing sturdy safeguards against such attacks. We make our code publicly available at https://github.com/ACMCMC/PDFuzz.

Key Contributions

PDFuzz: first PDF-based text ordering attack exploiting the gap between visual character layout and extraction order in PDF documents
Complete evasion of ArguGPT detector (93.6%→50.4% accuracy, F1 0.938→0.0) with zero content, visual, or semantic modification
Empirical demonstration that current AI text detectors are fundamentally vulnerable to document-format-level manipulation rather than only content-level attacks

🛡️ Threat Analysis

Output Integrity Attack

PDFuzz attacks an AI-generated text detection system (ArguGPT) by manipulating PDF document structure to scramble text extraction order, defeating content authenticity verification while leaving visual appearance and semantics intact — a direct attack on output integrity and AI-generated content detection infrastructure.

Details

Domains

nlp

Model Types

transformer

Threat Tags

black_boxinference_timetargeteddigital

Datasets

ArguGPT dataset

Applications

2025 0 cit.

Output Integrity Attack

75%