attack 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

Xuyang Guo ¹, Zekai Huang ², Zhao Song ³, Jiahao Zhang

¹ Guilin University of Electronic Technology

² The Ohio State University

³ University of California, Berkeley

0 citations

Published on arXiv

2508.13214

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

LLMs are reliably misled by hidden prompts injected into PDF files even when the underlying arithmetic questions are trivially simple, exposing serious robustness risks for LLM-as-a-judge deployments.

PDF-embedded hidden prompt injection

Novel technique introduced

Large Language Models (LLMs) have recently demonstrated strong emergent abilities in complex reasoning and zero-shot generalization, showing unprecedented potential for LLM-as-a-judge applications in education, peer review, and data quality evaluation. However, their robustness under prompt injection attacks, where malicious instructions are embedded into the content to manipulate outputs, remains a significant concern. In this work, we explore a frustratingly simple yet effective attack setting to test whether LLMs can be easily misled. Specifically, we evaluate LLMs on basic arithmetic questions (e.g., "What is 3 + 2?") presented as either multiple-choice or true-false judgment problems within PDF files, where hidden prompts are injected into the file. Our results reveal that LLMs are indeed vulnerable to such hidden prompt injection attacks, even in these trivial scenarios, highlighting serious robustness risks for LLM-as-a-judge applications.

Key Contributions

Identifies a critical vulnerability in LLM-as-a-judge systems: hidden prompt injection via PDF files breaks correct answering even on trivial arithmetic and true/false questions.
Demonstrates that this attack is effective across multiple LLMs, highlighting systemic robustness failures for educational, peer review, and data quality evaluation use cases.
Establishes a minimal, reproducible attack setting (simple MCQ in PDF + injected instruction) that exposes the gap between LLM emergent reasoning ability and prompt injection robustness.

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

llm-as-a-judgeautomated exam gradingpeer review automationdata quality evaluation

Read PDF arXiv

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

PINA: Prompt Injection Attack against Navigation Agents

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Semantic Representation Attack against Aligned Large Language Models

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming