defense 2025

Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

Minseok Kim , Hankook Lee , Hyungjoon Koo

Sungkyunkwan University

2 citations · 1 influential · 67 references · Asia-Pacific Computer Systems ...

Published on arXiv

2511.01268

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

RAGDefender reduces ASR against Gemini from 0.89 to 0.02 under a 4x adversarial-to-legitimate passage ratio, far outperforming RobustRAG (0.69) and Discern-and-Answer (0.24).

RAGDefender

Novel technique introduced

Large language models (LLMs) are reshaping numerous facets of our daily lives, leading widespread adoption as web-based services. Despite their versatility, LLMs face notable challenges, such as generating hallucinated content and lacking access to up-to-date information. Lately, to address such limitations, Retrieval-Augmented Generation (RAG) has emerged as a promising direction by generating responses grounded in external knowledge sources. A typical RAG system consists of i) a retriever that probes a group of relevant passages from a knowledge base and ii) a generator that formulates a response based on the retrieved content. However, as with other AI systems, recent studies demonstrate the vulnerability of RAG, such as knowledge corruption attacks by injecting misleading information. In response, several defense strategies have been proposed, including having LLMs inspect the retrieved passages individually or fine-tuning robust retrievers. While effective, such approaches often come with substantial computational costs. In this work, we introduce RAGDefender, a resource-efficient defense mechanism against knowledge corruption (i.e., by data poisoning) attacks in practical RAG deployments. RAGDefender operates during the post-retrieval phase, leveraging lightweight machine learning techniques to detect and filter out adversarial content without requiring additional model training or inference. Our empirical evaluations show that RAGDefender consistently outperforms existing state-of-the-art defenses across multiple models and adversarial scenarios: e.g., RAGDefender reduces the attack success rate (ASR) against the Gemini model from 0.89 to as low as 0.02, compared to 0.69 for RobustRAG and 0.24 for Discern-and-Answer when adversarial passages outnumber legitimate ones by a factor of four (4x).

Key Contributions

RAGDefender: a resource-efficient post-retrieval defense that uses lightweight ML classifiers to detect and filter adversarial passages without requiring additional model training or LLM inference calls
Empirical demonstration that RAGDefender outperforms state-of-the-art RAG defenses (RobustRAG, Discern-and-Answer) across multiple LLMs under high adversarial passage ratios
Practical defense design that reduces attack success rate against Gemini from 0.89 to 0.02 even when adversarial passages outnumber legitimate ones 4-to-1

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timetargetedblack_box

Applications

retrieval-augmented generationllm question answeringknowledge-grounded generation

Read PDF arXiv DOI

Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

RAJ-PGA: Reasoning-Activated Jailbreak and Principle-Guided Alignment Framework for Large Reasoning Models

Bias Injection Attacks on RAG Databases and Sanitization Defenses

A Causal Perspective for Enhancing Jailbreak Attack and Defense

Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning

Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs

Robust Safety Monitoring of Language Models via Activation Watermarking