attack 2025

Complete Evasion, Zero Modification: PDF Attacks on AI Text Detection

Aldan Creo

0 citations

α

Published on arXiv

2508.01887

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PDFuzz reduces ArguGPT detector accuracy from 93.6% to random-level 50.4% (F1: 0.938→0.0) while preserving perfect visual fidelity through PDF character-position scrambling.

PDFuzz

Novel technique introduced


AI-generated text detectors have become essential tools for maintaining content authenticity, yet their robustness against evasion attacks remains questionable. We present PDFuzz, a novel attack that exploits the discrepancy between visual text layout and extraction order in PDF documents. Our method preserves exact textual content while manipulating character positioning to scramble extraction sequences. We evaluate this approach against the ArguGPT detector using a dataset of human and AI-generated text. Our results demonstrate complete evasion: detector performance drops from (93.6 $\pm$ 1.4) % accuracy and 0.938 $\pm$ 0.014 F1 score to random-level performance ((50.4 $\pm$ 3.2) % accuracy, 0.0 F1 score) while maintaining perfect visual fidelity. Our work reveals a vulnerability in current detection systems that is inherent to PDF document structures and underscores the need for implementing sturdy safeguards against such attacks. We make our code publicly available at https://github.com/ACMCMC/PDFuzz.


Key Contributions

  • PDFuzz: first PDF-based text ordering attack exploiting the gap between visual character layout and extraction order in PDF documents
  • Complete evasion of ArguGPT detector (93.6%→50.4% accuracy, F1 0.938→0.0) with zero content, visual, or semantic modification
  • Empirical demonstration that current AI text detectors are fundamentally vulnerable to document-format-level manipulation rather than only content-level attacks

🛡️ Threat Analysis

Output Integrity Attack

PDFuzz attacks an AI-generated text detection system (ArguGPT) by manipulating PDF document structure to scramble text extraction order, defeating content authenticity verification while leaving visual appearance and semantics intact — a direct attack on output integrity and AI-generated content detection infrastructure.


Details

Domains
nlp
Model Types
transformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
ArguGPT dataset
Applications
ai-generated text detectionacademic integrity toolscontent authenticity verification