defense 2026

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

Sagie Dekel , Moshe Tennenholtz , Oren Kurland

0 citations · 45 references · arXiv (Cornell University)

α

Published on arXiv

2602.04711

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

SDAG substantially outperforms standard causal attention in terms of attack success rate across multiple corpus poisoning strategies, with statistically significant improvements when integrated with existing SOTA RAG defenses.

SDAG (Sparse Document Attention RAG)

Novel technique introduced


Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLM's output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.


Key Contributions

  • Identifies that standard causal attention enables harmful cross-document interactions in RAG that amplify corpus poisoning attacks
  • Proposes SDAG (Sparse Document Attention RAG): a block-sparse attention mask that disallows cross-attention between retrieved documents, requiring no fine-tuning or architectural changes
  • Demonstrates SDAG substantially reduces attack success rate and yields statistically significant gains when combined with existing state-of-the-art RAG defenses

🛡️ Threat Analysis

Input Manipulation Attack

Corpus knowledge poisoning involves adversarially crafted document injection into the RAG retrieval corpus to manipulate LLM outputs at inference time — the taxonomy explicitly identifies adversarial document injection for RAG as an ML01 case warranting dual ML01+LLM01 tagging.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timetargetedblack_box
Applications
question answeringrag systemsllm-based information retrieval