attack 2025

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

Xinyun Zhou ^1,2, Xinfeng Li ³, Yinan Peng ^2,4, Ming Xu ⁴, Xuanwang Zhang ³, Miao Yu ⁵, Yidong Wang , Xiaojun Jia ^6,3, Kun Wang ³, Qingsong Wen ⁷, XiaoFeng Wang ³, Wei Dong ³

¹ Zhejiang University

² Hengxin Technology

³ Nanyang Technological University

⁴ National University of Singapore

⁵ Nanjing University

⁶ Peking University

⁷ Squirrel AI Learning

1 citations · 72 references · arXiv

Published on arXiv

2512.01335

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Injecting a single emoticon token into a RAG query causes nearly 100% retrieval of semantically unrelated emoticon-containing documents, with adversarial F1-scores exceeding 0.92 across all tested datasets and retrievers

EmoRAG

Novel technique introduced

Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study unveils a critical, overlooked vulnerability: their profound susceptibility to subtle symbolic perturbations, particularly through near-imperceptible emoticon tokens such as "(@_@)" that can catastrophically mislead retrieval, termed EmoRAG. We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts that contain a matching emoticon. Our extensive experiment across general question-answering and code domains, using a range of state-of-the-art retrievers and generators, reveals three key findings: (I) Single-Emoticon Disaster: Minimal emoticon injections cause maximal disruptions, with a single emoticon almost 100% dominating RAG output. (II) Positional Sensitivity: Placing an emoticon at the beginning of a query can cause severe perturbation, with F1-Scores exceeding 0.92 across all datasets. (III) Parameter-Scale Vulnerability: Counterintuitively, models with larger parameters exhibit greater vulnerability to the interference. We provide an in-depth analysis to uncover the underlying mechanisms of these phenomena. Furthermore, we raise a critical concern regarding the robustness assumption of current RAG systems, envisioning a threat scenario where an adversary exploits this vulnerability to manipulate the RAG system. We evaluate standard defenses and find them insufficient against EmoRAG. To address this, we propose targeted defenses, analyzing their strengths and limitations in mitigating emoticon-based perturbations. Finally, we outline future directions for building robust RAG systems.

Key Contributions

Discovery of EmoRAG vulnerability: a single emoticon injected into a query achieves ~100% retrieval poisoning, overriding semantic relevance in state-of-the-art dense and sparse retrievers
Systematic characterization of the vulnerability (single-emoticon disaster, positional sensitivity, parameter-scale vulnerability) across QA and code domains
Evaluation showing standard defenses are insufficient and proposal of targeted countermeasures with analysis of their limitations

🛡️ Threat Analysis

Input Manipulation Attack

The paper demonstrates adversarial content manipulation of an LLM-integrated system (RAG): strategically injecting emoticons into queries or knowledge base documents to manipulate retrieval outputs at inference time — explicitly matching the 'adversarial document injection for RAG' subcategory under ML01.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_boxdigitaltargeted

Datasets

HotpotQATriviaQAcode QA benchmarks

Applications

retrieval-augmented generationquestion answeringcode generation

Read PDF arXiv DOI

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval

Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation