attack 2025

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Haorui He 1,2, Yupeng Li 1,2, Bin Benjamin Zhu 3, Dacheng Wen 1,2, Reynold Cheng 2, Francis C. M. Lau 2

0 citations

α

Published on arXiv

2508.06059

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Fact2Fiction achieves 8.9%–21.2% higher attack success rates than PoisonedRAG across varying poisoning budgets against DEFAME and InFact agentic fact-checking systems.

Fact2Fiction

Novel technique introduced


State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.


Key Contributions

  • First poisoning attack framework (Fact2Fiction) targeting agentic, claim-decomposition-based LLM fact-checking systems rather than naive single-pass RAG systems
  • Novel threat model that exploits system-generated justifications to identify vulnerable sub-claims and craft targeted malicious evidence aligned with the system's own reasoning
  • Empirical demonstration of 8.9%–21.2% higher attack success rates over PoisonedRAG with only 6.3%–12.5% of the malicious evidence budget, exposing a transparency-security trade-off in justification-producing fact-checkers

🛡️ Threat Analysis

Data Poisoning Attack

The attack injects crafted malicious evidences into the retrieval corpus (knowledge base) used by the agentic fact-checking system — a data/knowledge store poisoning attack that corrupts the information source the model relies on for inference.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
AVeriTeC
Applications
automated fact-checkingmisinformation detectionrag-based llm agents