MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning
Zhihui Chen 1, Kai He 1, Qingyuan Lei 2,3, Bin Pu 3, Jian Zhang 4, Yuling Xu 5, Mengling Feng 1
Published on arXiv
2603.18577
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Improves detection accuracy by 7.65% and reduces hallucinations by 16.2% compared to baselines via forgery-aware GSPO
MedForge-Reasoner
Novel technique introduced
Text-guided image editors can now manipulate authentic medical scans with high fidelity, enabling lesion implantation/removal that threatens clinical trust and safety. Existing defenses are inadequate for healthcare. Medical detectors are largely black-box, while MLLM-based explainers are typically post-hoc, lack medical expertise, and may hallucinate evidence on ambiguous cases. We present MedForge, a data-and-method solution for pre-hoc, evidence-grounded medical forgery detection. We introduce MedForge-90K, a large-scale benchmark of realistic lesion edits across 19 pathologies with expert-guided reasoning supervision via doctor inspection guidelines and gold edit locations. Building on it, MedForge-Reasoner performs localize-then-analyze reasoning, predicting suspicious regions before producing a verdict, and is further aligned with Forgery-aware GSPO to strengthen grounding and reduce hallucinations. Experiments demonstrate state-of-the-art detection accuracy and trustworthy, expert-aligned explanations.
Key Contributions
- MedForge-90K benchmark: 90K realistic medical lesion edits across 19 pathologies with expert-guided reasoning supervision
- MedForge-Reasoner: MLLM-based detector performing localize-then-analyze reasoning with forgery-aware GSPO alignment
- Pre-hoc grounded explanations that identify suspicious regions before verdict, reducing hallucinations by 16.2%
🛡️ Threat Analysis
Detects AI-generated manipulations (lesion implants/removals) in medical images using text-guided editors — this is deepfake detection and content authenticity verification, core ML09.