Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.

Key Contributions

First poisoning attack framework (Fact2Fiction) targeting agentic, claim-decomposition-based LLM fact-checking systems rather than naive single-pass RAG systems
Novel threat model that exploits system-generated justifications to identify vulnerable sub-claims and craft targeted malicious evidence aligned with the system's own reasoning
Empirical demonstration of 8.9%–21.2% higher attack success rates over PoisonedRAG with only 6.3%–12.5% of the malicious evidence budget, exposing a transparency-security trade-off in justification-producing fact-checkers

🛡️ Threat Analysis

Data Poisoning Attack

The attack injects crafted malicious evidences into the retrieval corpus (knowledge base) used by the agentic fact-checking system — a data/knowledge store poisoning attack that corrupts the information source the model relies on for inference.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

AVeriTeC

Applications

2025 1 cit.

Data Poisoning Attack

80%