RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.

Key Contributions

RAG-Pull attack using hidden UTF characters to imperceptibly manipulate query and document similarity, redirecting retrieval toward attacker-controlled malicious code snippets
Demonstrates that combined query-and-target perturbations achieve near-perfect retrieval success against RAG-based code generation systems
Shows that retrieved malicious snippets introduce real exploitable vulnerabilities (remote code execution, SQL injection), effectively bypassing LLM safety alignment

🛡️ Threat Analysis

Input Manipulation Attack

RAG-Pull uses strategically crafted imperceptible perturbations (hidden UTF characters) inserted into queries and external code repositories to manipulate retrieval in an LLM-integrated system — a clear case of adversarial content manipulation of an RAG/LLM pipeline, which the classification rules explicitly cite as ML01+LLM01 territory (adversarial document injection for RAG).

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeteddigital

Applications

2026 2 cit.

Input Manipulation Attack

93%

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems