attack 2025

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Panagiotis Theocharopoulos ¹, Ajinkya Kulkarni ², Mathew Magimai.-Doss ²

¹ International School of Athens

² Idiap Research Institute

0 citations · 13 references · arXiv

Published on arXiv

2512.23684

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Hidden prompt injections in English, Japanese, and Chinese cause substantial changes in LLM-assigned review scores and accept/reject decisions, while Arabic injections have little to no effect, revealing language-dependent susceptibility.

Multilingual Hidden Prompt Injection

Novel technique introduced

Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML and evaluate the effect of embedding hidden adversarial prompts within these documents. Each paper is injected with semantically equivalent instructions in four different languages and reviewed using an LLM. We find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight the susceptibility of LLM-based reviewing systems to document-level prompt injection and reveal notable differences in vulnerability across languages.

Key Contributions

Dataset of ~500 real ICML-accepted papers injected with hidden adversarial prompts in four languages (English, Japanese, Chinese, Arabic)
Empirical demonstration that English, Japanese, and Chinese injections substantially change LLM review scores and accept/reject decisions
Discovery of language-dependent vulnerability: Arabic injections produce negligible effect, revealing uneven multilingual instruction-following in alignment-tuned LLMs

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeteddigital

Datasets

ICML 2024 accepted papers (~500 papers)

Applications

llm-based academic peer review

Read PDF arXiv DOI

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Self-HarmLLM: Can Large Language Model Harm Itself?

A Whole New World: Creating a Parallel-Poisoned Web Only AI-Agents Can See

Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review

Automating Agent Hijacking via Structural Template Injection

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Jailbreaking LLMs via Semantically Relevant Nested Scenarios with Targeted Toxic Knowledge