tool 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq 1, Simon S. Woo 2, Priyanka Singh 3, Irena Irmalasari 3, Saakshi Gupta 3, Dev Gupta 3

0 citations

α

Published on arXiv

2508.07596

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves competitive deepfake detection performance on DF40 while producing natural language explanations aligned with Grad-CAM saliency maps, making forensic reasoning accessible to non-expert users

DF-P2E

Novel technique introduced


The proliferation of deepfake technologies poses urgent challenges and serious risks to digital integrity, particularly within critical sectors such as forensics, journalism, and the legal system. While existing detection systems have made significant progress in classification accuracy, they typically function as black-box models, offering limited transparency and minimal support for human reasoning. This lack of interpretability hinders their usability in real-world decision-making contexts, especially for non-expert users. In this paper, we present DF-P2E (Deepfake: Prediction to Explanation), a novel multimodal framework that integrates visual, semantic, and narrative layers of explanation to make deepfake detection interpretable and accessible. The framework consists of three modular components: (1) a deepfake classifier with Grad-CAM-based saliency visualisation, (2) a visual captioning module that generates natural language summaries of manipulated regions, and (3) a narrative refinement module that uses a fine-tuned Large Language Model (LLM) to produce context-aware, user-sensitive explanations. We instantiate and evaluate the framework on the DF40 benchmark, the most diverse deepfake dataset to date. Experiments demonstrate that our system achieves competitive detection performance while providing high-quality explanations aligned with Grad-CAM activations. By unifying prediction and explanation in a coherent, human-aligned pipeline, this work offers a scalable approach to interpretable deepfake detection, advancing the broader vision of trustworthy and transparent AI systems in adversarial media environments.


Key Contributions

  • DF-P2E: a three-module pipeline combining a Grad-CAM-based deepfake classifier, a visual captioning module generating natural language descriptions of manipulated regions, and a fine-tuned LLM narrative refinement module for user-sensitive explanations
  • First framework to unify prediction and multimodal explanation layers for deepfake detection targeted at non-expert users
  • Evaluation on DF40, the most diverse deepfake benchmark, demonstrating competitive detection performance alongside high-quality, Grad-CAM-aligned explanations

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection — specifically deepfake detection — which is a core ML09 concern around output integrity and content authenticity. The framework contributes new detection explainability components (Grad-CAM saliency, visual captioning, LLM narrative refinement) evaluated on the DF40 benchmark.


Details

Domains
visionnlpmultimodal
Model Types
cnntransformerllm
Threat Tags
inference_timedigital
Datasets
DF40
Applications
deepfake detectiondigital forensicsmultimedia authentication