ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases, tricking Large Language Models (LLMs) into producing attacker-controlled outputs grounded in manipulated context. Prior work highlights LLMs' susceptibility to misleading or malicious retrieved content. However, real-world fact-checking scenarios are more challenging, as credible evidence typically dominates the retrieval pool. To investigate this problem, we extend knowledge poisoning to the fact-checking setting, where retrieved context includes authentic supporting or refuting evidence. We propose \textbf{ADMIT} (\textbf{AD}versarial \textbf{M}ulti-\textbf{I}njection \textbf{T}echnique), a few-shot, semantically aligned poisoning attack that flips fact-checking decisions and induces deceptive justifications, all without access to the target LLMs, retrievers, or token-level control. Extensive experiments show that ADMIT transfers effectively across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks, achieving an average attack success rate (ASR) of 86\% at an extremely low poisoning rate of $0.93 \times 10^{-6}$, and remaining robust even in the presence of strong counter-evidence. Compared with prior state-of-the-art attacks, ADMIT improves ASR by 11.2\% across all settings, exposing significant vulnerabilities in real-world RAG-based fact-checking systems.

Key Contributions

ADMIT: a few-shot, semantically aligned, black-box knowledge poisoning attack that injects adversarial documents into RAG knowledge bases to flip fact-checking verdicts and induce deceptive justifications.
Demonstrates transfer across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks with an average ASR of 86% at an extremely low poisoning rate of 0.93×10⁻⁶.
Outperforms prior state-of-the-art RAG poisoning attacks by 11.2% ASR across all settings, including against strong counter-evidence.

🛡️ Threat Analysis

Data Poisoning Attack

ADMIT injects adversarial documents into the RAG knowledge base — a data poisoning attack on the retrieval corpus that corrupts the information pool used by LLMs at inference time.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeted

Datasets

4 cross-domain fact-checking benchmarks (unspecified in excerpt)

Applications

2025 1 cit.

Data Poisoning Attack

87%