RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection

To efficiently combat the spread of LLM-generated misinformation, we present RADAR, a retrieval-augmented detector with adversarial refinement for robust fake news detection. Our approach employs a generator that rewrites real articles with factual perturbations, paired with a lightweight detector that verifies claims using dense passage retrieval. To enable effective co-evolution, we introduce verbal adversarial feedback (VAF). Rather than relying on scalar rewards, VAF issues structured natural-language critiques; these guide the generator toward more sophisticated evasion attempts, compelling the detector to adapt and improve. On a fake news detection benchmark, RADAR achieves 86.98% ROC-AUC, significantly outperforming general-purpose LLMs with retrieval. Ablation studies confirm that detector-side retrieval yields the largest gains, while VAF and few-shot demonstrations provide critical signals for robust training.

Key Contributions

RADAR framework: a retrieval-augmented detector paired with an adversarial LLM generator for co-evolutionary training on fake news detection
Verbal Adversarial Feedback (VAF): structured natural-language critiques that replace scalar rewards to guide the generator toward more sophisticated evasion, compelling detector improvement
Demonstrates that detector-side dense passage retrieval yields the largest performance gains, achieving 86.98% ROC-AUC on a fake news benchmark

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is detecting LLM-generated misinformation — a form of AI-generated text detection — and proposes a novel adversarial training framework (Verbal Adversarial Feedback) to harden the detector against sophisticated LLM-generated content, directly targeting output integrity and content authenticity.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

2026 0 cit.

Output Integrity Attack

100%