DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity
Ashish Raj Shekhar , Shiven Agarwal , Priyanuj Bordoloi , Yash Shah , Tejas Anvekar , Vivek Gupta
Published on arXiv
2601.12505
Output Integrity Attack
OWASP ML Top 10 — ML09
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Against black-box OpenAI and Anthropic MLLMs, DoPE achieves 91.4% detection at 8.7% FPR and prevents successful completion or induces decoy-aligned failures in 96.3% of attempts
DoPE (Decoy-Oriented Perturbation Encapsulation)
Novel technique introduced
Multimodal Large Language Models (MLLMs) can directly consume exam documents, threatening conventional assessments and academic integrity. We present DoPE (Decoy-Oriented Perturbation Encapsulation), a document-layer defense framework that embeds semantic decoys into PDF/HTML assessments to exploit render-parse discrepancies in MLLM pipelines. By instrumenting exams at authoring time, DoPE provides model-agnostic prevention (stop or confound automated solving) and detection (flag blind AI reliance) without relying on conventional one-shot classifiers. We formalize prevention and detection tasks, and introduce FewSoRT-Q, an LLM-guided pipeline that generates question-level semantic decoys and FewSoRT-D to encapsulate them into watermarked documents. We evaluate on Integrity-Bench, a novel benchmark of 1826 exams (PDF+HTML) derived from public QA datasets and OpenCourseWare. Against black-box MLLMs from OpenAI and Anthropic, DoPE yields strong empirical gains: a 91.4% detection rate at an 8.7% false-positive rate using an LLM-as-Judge verifier, and prevents successful completion or induces decoy-aligned failures in 96.3% of attempts. We release Integrity-Bench, our toolkit, and evaluation code to enable reproducible study of document-layer defenses for academic integrity.
Key Contributions
- DoPE framework that embeds semantic decoys into PDF/HTML exam documents by exploiting render-parse discrepancies in MLLM pipelines, achieving 96.3% prevention of successful MLLM-assisted completion
- FewSoRT-Q/D pipeline: LLM-guided generation of question-level semantic decoys (FewSoRT-Q) and their encapsulation into visually unchanged, watermarked documents (FewSoRT-D)
- Integrity-Bench: a novel benchmark of 1,826 paired PDF+HTML exams with multiple watermarked variants, enabling controlled evaluation of document-layer defenses against black-box MLLMs
🛡️ Threat Analysis
The detection component watermarks exam documents with semantic decoys to identify when MLLMs are used to solve exams — verifying content integrity via an LLM-as-Judge verifier that flags decoy-aligned answers as evidence of AI reliance.