defense 2025

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

Kanchon Gharami ¹, Sanjiv Kumar Sarkar ², Yongxin Liu ¹, Shafika Showkat Moni ¹

¹ Embry-Riddle Aeronautical University

² Axelon Services Corporation

0 citations · 23 references · arXiv

Published on arXiv

2512.20405

Output Integrity Attack

OWASP ML Top 10 — ML09

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Demonstrates that hidden PDF prompts can systematically manipulate LLM reviewers toward positive outcomes, and that editor-injected trigger traps can reliably distinguish AI-generated reviews from human ones without modifying the underlying LLM

inject-and-detect

Novel technique introduced

Large Language Models (LLMs) like ChatGPT are now widely used in writing and reviewing scientific papers. While this trend accelerates publication growth and reduces human workload, it also introduces serious risks. Papers written or reviewed by LLMs may lack real novelty, contain fabricated or biased results, or mislead downstream research that others depend on. Such issues can damage reputations, waste resources, and even endanger lives when flawed studies influence medical or safety-critical systems. This research explores both the offensive and defensive sides of this growing threat. On the attack side, we demonstrate how an author can inject hidden prompts inside a PDF that secretly guide or "jailbreak" LLM reviewers into giving overly positive feedback and biased acceptance. On the defense side, we propose an "inject-and-detect" strategy for editors, where invisible trigger prompts are embedded into papers; if a review repeats or reacts to these triggers, it reveals that the review was generated by an LLM, not a human. This method turns prompt injections from vulnerability into a verification tool. We outline our design, expected model behaviors, and ethical safeguards for deployment. The goal is to expose how fragile today's peer-review process becomes under LLM influence and how editorial awareness can help restore trust in scientific evaluation.

Key Contributions

Hybrid hidden-prompt injection attack combining invisible PDF text, smooth topic-shift transitions, and surrogate-model-based iterative refinement to steer LLM reviewers toward acceptance
Two-layer defense combining document structural reconstruction (dual text-view comparison) and behavioral mutation testing to detect injected prompts without requiring model weights
Editor-injected trigger-trap mechanism that embeds invisible prompts in papers and flags any review that echoes them as LLM-generated, turning prompt injection from vulnerability into a verification tool

🛡️ Threat Analysis

Output Integrity Attack

The editor-side 'inject-and-detect' strategy is fundamentally an AI-generated content detection mechanism: editors embed invisible trigger prompts in manuscripts; if a submitted review echoes those triggers, it is flagged as LLM-generated rather than human-written, directly addressing output authenticity and AI content provenance.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeteddigital

Applications

llm-assisted peer reviewscientific paper review systems

Read PDF arXiv DOI

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

LLMs can hide text in other text of the same length

From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching

Risk Assessment and Security Analysis of Large Language Models

SoK: Exposing the Generation and Detection Gaps in LLM-Generated Phishing Through Examination of Generation Methods, Content Characteristics, and Countermeasures

RAJ-PGA: Reasoning-Activated Jailbreak and Principle-Guided Alignment Framework for Large Reasoning Models

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents

Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems