ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking
Yutao Wu 1, Xiao Liu 1, Yinghui Li 1, Yifeng Gao 2, Yifan Ding 2, Jiale Ding 2, Xiang Zheng 3, Xingjun Ma 2
Published on arXiv
2510.13842
Data Poisoning Attack
OWASP ML Top 10 — ML02
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
ADMIT achieves an average attack success rate of 86% at a poisoning rate of 0.93×10⁻⁶, improving over SOTA by 11.2% across 4 retrievers and 11 LLMs without access to target models.
ADMIT (ADversarial Multi-Injection Technique)
Novel technique introduced
Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases, tricking Large Language Models (LLMs) into producing attacker-controlled outputs grounded in manipulated context. Prior work highlights LLMs' susceptibility to misleading or malicious retrieved content. However, real-world fact-checking scenarios are more challenging, as credible evidence typically dominates the retrieval pool. To investigate this problem, we extend knowledge poisoning to the fact-checking setting, where retrieved context includes authentic supporting or refuting evidence. We propose \textbf{ADMIT} (\textbf{AD}versarial \textbf{M}ulti-\textbf{I}njection \textbf{T}echnique), a few-shot, semantically aligned poisoning attack that flips fact-checking decisions and induces deceptive justifications, all without access to the target LLMs, retrievers, or token-level control. Extensive experiments show that ADMIT transfers effectively across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks, achieving an average attack success rate (ASR) of 86\% at an extremely low poisoning rate of $0.93 \times 10^{-6}$, and remaining robust even in the presence of strong counter-evidence. Compared with prior state-of-the-art attacks, ADMIT improves ASR by 11.2\% across all settings, exposing significant vulnerabilities in real-world RAG-based fact-checking systems.
Key Contributions
- ADMIT: a few-shot, semantically aligned, black-box knowledge poisoning attack that injects adversarial documents into RAG knowledge bases to flip fact-checking verdicts and induce deceptive justifications.
- Demonstrates transfer across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks with an average ASR of 86% at an extremely low poisoning rate of 0.93×10⁻⁶.
- Outperforms prior state-of-the-art RAG poisoning attacks by 11.2% ASR across all settings, including against strong counter-evidence.
🛡️ Threat Analysis
ADMIT injects adversarial documents into the RAG knowledge base — a data poisoning attack on the retrieval corpus that corrupts the information pool used by LLMs at inference time.