defense 2026

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Yuxin Liu 1,2, Fei Wang 3,2, Kun Li 4, Yiqi Nie 1,2, Junjie Chen 3,2, Zhangling Duan 2, Zhaohong Jia 1

0 citations

α

Published on arXiv

2603.17761

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Outperforms strong baselines on diverse benchmarks without LVLM fine-tuning, demonstrating cross-domain generalization

SCEP (Semantic Consistent Evidence Pack)

Novel technique introduced


Image Deepfake Detection (IDD) separates manipulated images from authentic ones by spotting artifacts of synthesis or tampering. Although large vision-language models (LVLMs) offer strong image understanding, adapting them to IDD often demands costly fine-tuning and generalizes poorly to diverse, evolving manipulations. We propose the Semantic Consistent Evidence Pack (SCEP), a training-free LVLM framework that replaces whole-image inference with evidence-driven reasoning. SCEP mines a compact set of suspicious patch tokens that best reveal manipulation cues. It uses the vision encoder's CLS token as a global reference, clusters patch features into coherent groups, and scores patches with a fused metric combining CLS-guided semantic mismatch with frequency-and noise-based anomalies. To cover dispersed traces and avoid redundancy, SCEP samples a few high-confidence patches per cluster and applies grid-based NMS, producing an evidence pack that conditions a frozen LVLM for prediction. Experiments on diverse benchmarks show SCEP outperforms strong baselines without LVLM fine-tuning.


Key Contributions

  • Training-free LVLM framework (SCEP) that replaces whole-image inference with evidence-driven reasoning via suspicious patch token mining
  • Semantic clustering of patch features with CLS-guided scoring that fuses semantic mismatch with frequency/noise anomalies
  • Grid-based NMS sampling strategy to produce compact evidence packs covering dispersed manipulation traces without redundancy

🛡️ Threat Analysis

Output Integrity Attack

Detects AI-generated and manipulated images (deepfakes) to verify content authenticity and integrity — this is output integrity / AI-generated content detection, the core focus of ML09.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
inference_time
Applications
deepfake detectionimage manipulation detectioncontent authenticity verification