Training-Free Multimodal Deepfake Detection via Graph Reasoning
Yuxin Liu 1, Fei Wang 1,2,3, Kun Li 4, Yiqi Nie 3, Junjie Chen 3, Yanyan Wei 1, Zhangling Duan 3, Zhaohong Jia 1
Published on arXiv
2509.21774
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
GASP-ICL outperforms strong baselines across four forgery types without any LVLM fine-tuning, demonstrating robust generalization in training-free multimodal deepfake detection.
GASP-ICL
Novel technique introduced
Multimodal deepfake detection (MDD) aims to uncover manipulations across visual, textual, and auditory modalities, thereby reinforcing the reliability of modern information systems. Although large vision-language models (LVLMs) exhibit strong multimodal reasoning, their effectiveness in MDD is limited by challenges in capturing subtle forgery cues, resolving cross-modal inconsistencies, and performing task-aligned retrieval. To this end, we propose Guided Adaptive Scorer and Propagation In-Context Learning (GASP-ICL), a training-free framework for MDD. GASP-ICL employs a pipeline to preserve semantic relevance while injecting task-aware knowledge into LVLMs. We leverage an MDD-adapted feature extractor to retrieve aligned image-text pairs and build a candidate set. We further design the Graph-Structured Taylor Adaptive Scorer (GSTAS) to capture cross-sample relations and propagate query-aligned signals, producing discriminative exemplars. This enables precise selection of semantically aligned, task-relevant demonstrations, enhancing LVLMs for robust MDD. Experiments on four forgery types show that GASP-ICL surpasses strong baselines, delivering gains without LVLM fine-tuning.
Key Contributions
- GASP-ICL: a training-free in-context learning framework for multimodal deepfake detection that requires no LVLM fine-tuning
- GSTAS (Graph-Structured Taylor Adaptive Scorer): models cross-sample semantic and structural relations via graph propagation and a Taylor gate mechanism to surface latent manipulation cues
- MDD-oriented joint image-text similarity pipeline that retrieves task-aligned demonstrations for discriminative LVLM prompting
🛡️ Threat Analysis
Directly addresses AI-generated and manipulated content detection across multiple modalities (visual, text, audio). GASP-ICL is a novel detection architecture for deepfake content integrity and provenance — not merely applying an existing detector to a new domain, but contributing a new graph-structured scoring and ICL-based retrieval method for this task.