defense 2025

Bias Injection Attacks on RAG Databases and Sanitization Defenses

Hao Wu , Prateek Saxena

0 citations · 52 references · arXiv

α

Published on arXiv

2512.00804

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

BiasDef reduces adversarial passages retrieved by 15%, mitigates LLM perspective shift by 6.2×, and enables retrieval of 62% more benign passages compared to existing retrieval-based sanitization defenses.

BiasDef

Novel technique introduced


This paper explores attacks and defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects. We reveal a new and subtle threat: bias injection attacks, which insert factually correct yet semantically biased passages into the knowledge base to covertly influence the ideological framing of answers generated by large language models (LLMs). We demonstrate that these adversarial passages, though linguistically coherent and truthful, can systematically crowd out opposing views from the retrieved context and steer LLM answers toward the attacker's intended perspective. We precisely characterize this class of attacks and then develop a post-retrieval filtering defense, BiasDef. We construct a comprehensive benchmark based on public question answering datasets to evaluate them. Our results show that: (1) the proposed attack induces significant perspective shifts in LLM answers, effectively evading existing retrieval-based sanitization defenses; and (2) BiasDef outperforms existing methods by reducing adversarial passages retrieved by 15\% which mitigates perspective shift by 6.2\times in answers, while enabling the retrieval of 62\% more benign passages.


Key Contributions

  • Characterizes a novel bias injection attack on RAG vector databases that inserts linguistically coherent, factually correct but semantically biased passages to crowd out opposing views and steer LLM answers toward an attacker's perspective
  • Proposes BiasDef, a post-retrieval filtering defense that reduces adversarial passages retrieved by 15% and mitigates perspective shift by 6.2× over existing sanitization baselines
  • Constructs a comprehensive benchmark based on public QA datasets to evaluate both the attack and defenses, including existing retrieval-based sanitization methods

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Applications
retrieval-augmented generationquestion answeringllm knowledge bases