defense 2025

DF-LLaVA: Unlocking MLLMs for Synthetic Image Detection via Knowledge Injection and Conflict-Driven Self-Reflection

Zhuokang Shen 1, Kaisen Zhang 1, Bohan Jia 1, Heming Jia 2, Yuan Fang 1, Zhou Yu 3,1, Shaohui Lin 1,2

0 citations

α

Published on arXiv

2509.14957

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DF-LLaVA exceeds dedicated expert model accuracy on synthetic image detection benchmarks while providing artifact-level natural language explanations of forgery reasoning.

DF-LLaVA

Novel technique introduced


With the increasing prevalence of synthetic images, evaluating image authenticity and locating forgeries accurately while maintaining human interpretability remains a challenging task. Existing detection models primarily focus on simple authenticity classification, ultimately providing only a forgery probability or binary judgment, which offers limited explanatory insights into image authenticity. Moreover, while MLLM-based detection methods can provide more interpretable results, they still lag behind expert models in terms of pure authenticity classification accuracy. To address this, we propose DF-LLaVA, a novel and effective framework that unlocks the intrinsic discrimination potential of MLLMs. Our approach first mines latent knowledge from the MLLM itself and then injects it into the model via fine-tuning. During inference, conflict signals arising from the model's predictions activate a self-reflection process, leading to the final refined responses. This framework allows LLaVA to achieve outstanding detection accuracy exceeding expert models while still maintaining the interpretability offered by MLLMs. Extensive experiments confirm the superiority of DF-LLaVA, achieving both high accuracy and explainability in synthetic image detection. Code is available online at: https://github.com/Eliot-Shen/DF-LLaVA.


Key Contributions

  • Knowledge injection technique that mines and fine-tunes latent synthetic image discrimination knowledge from the MLLM itself
  • Conflict-driven self-reflection mechanism that activates when prediction signals conflict, refining final detection responses
  • DF-LLaVA framework that enables LLaVA to surpass expert-model accuracy on synthetic image detection while preserving natural-language interpretability

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is AI-generated/synthetic image detection — a core ML09 output integrity concern. Proposes a novel detection architecture that identifies and localizes image forgeries, directly addressing content authenticity verification.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
inference_time
Datasets
FakeBenchLOKI
Applications
synthetic image detectiondeepfake detectiondigital forensics