defense 2026

RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection

Song-Duo Ma , Yi-Hung Liu , Hsin-Yu Lin , Pin-Yu Chen , Hong-Yan Huang , Shau-Yung Hsu , Yun-Nung Chen

0 citations · 40 references · arXiv

α

Published on arXiv

2601.03981

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

RADAR achieves 86.98% ROC-AUC on a fake news detection benchmark, significantly outperforming general-purpose LLMs with retrieval augmentation.

RADAR (Verbal Adversarial Feedback)

Novel technique introduced


To efficiently combat the spread of LLM-generated misinformation, we present RADAR, a retrieval-augmented detector with adversarial refinement for robust fake news detection. Our approach employs a generator that rewrites real articles with factual perturbations, paired with a lightweight detector that verifies claims using dense passage retrieval. To enable effective co-evolution, we introduce verbal adversarial feedback (VAF). Rather than relying on scalar rewards, VAF issues structured natural-language critiques; these guide the generator toward more sophisticated evasion attempts, compelling the detector to adapt and improve. On a fake news detection benchmark, RADAR achieves 86.98% ROC-AUC, significantly outperforming general-purpose LLMs with retrieval. Ablation studies confirm that detector-side retrieval yields the largest gains, while VAF and few-shot demonstrations provide critical signals for robust training.


Key Contributions

  • RADAR framework: a retrieval-augmented detector paired with an adversarial LLM generator for co-evolutionary training on fake news detection
  • Verbal Adversarial Feedback (VAF): structured natural-language critiques that replace scalar rewards to guide the generator toward more sophisticated evasion, compelling detector improvement
  • Demonstrates that detector-side dense passage retrieval yields the largest performance gains, achieving 86.98% ROC-AUC on a fake news benchmark

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is detecting LLM-generated misinformation — a form of AI-generated text detection — and proposes a novel adversarial training framework (Verbal Adversarial Feedback) to harden the detector against sophisticated LLM-generated content, directly targeting output integrity and content authenticity.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
fake news detectionllm-generated misinformation detection