survey 2025

Adversarial Attacks Against Automated Fact-Checking: A Survey

Fanzhen Liu 1,2, Alsharif Abuadbba 2, Kristen Moore 2, Surya Nepal 2,3, Cecile Paris 2, Jia Wu 1, Jian Yang 1, Quan Z. Sheng 1

0 citations

α

Published on arXiv

2509.08463

Input Manipulation Attack

OWASP ML Top 10 — ML01

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Survey finds that AFC systems across all pipeline stages are vulnerable to adversarial manipulation and that current defenses remain insufficient, underscoring the need for end-to-end adversarially robust FC frameworks.


In an era where misinformation spreads freely, fact-checking (FC) plays a crucial role in verifying claims and promoting reliable information. While automated fact-checking (AFC) has advanced significantly, existing systems remain vulnerable to adversarial attacks that manipulate or generate claims, evidence, or claim-evidence pairs. These attacks can distort the truth, mislead decision-makers, and ultimately undermine the reliability of FC models. Despite growing research interest in adversarial attacks against AFC systems, a comprehensive, holistic overview of key challenges remains lacking. These challenges include understanding attack strategies, assessing the resilience of current models, and identifying ways to enhance robustness. This survey provides the first in-depth review of adversarial attacks targeting FC, categorizing existing attack methodologies and evaluating their impact on AFC systems. Additionally, we examine recent advancements in adversary-aware defenses and highlight open research questions that require further exploration. Our findings underscore the urgent need for resilient FC frameworks capable of withstanding adversarial manipulations in pursuit of preserving high verification accuracy.


Key Contributions

  • First comprehensive survey categorizing adversarial attacks against automated fact-checking systems into three classes: adversarial claim attacks, adversarial evidence attacks, and adversarial claim-evidence pair attacks.
  • Systematic review of adversary-aware defenses and their effectiveness against known attack strategies.
  • Identification of open research questions and future directions for building resilient AFC frameworks.

🛡️ Threat Analysis

Input Manipulation Attack

Adversarial claim attacks (paraphrasing, multi-hop reformulation) are inference-time input manipulation attacks designed to cause AFC models to produce incorrect verdicts — core evasion attacks on NLP classifiers.

Data Poisoning Attack

Adversarial evidence attacks that inject manipulated or fabricated evidence into the retrieval corpus represent data/corpus poisoning, corrupting the information the AFC model relies on for verdict prediction.


Details

Domains
nlpmultimodal
Model Types
llmtransformer
Threat Tags
inference_timetraining_timetargeteddigital
Datasets
FEVER
Applications
automated fact-checkingmisinformation detectionclaim verification