tool 2025

INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts

Anshul Bagaria

0 citations · 105 references · arXiv

α

Published on arXiv

2511.22351

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

INSIGHT outperforms prior detectors and black-box VLM baselines across animals, vehicles, and abstract synthetic scenes under severe downsampling and compression degradation

INSIGHT

Novel technique introduced


The growing realism of AI-generated images produced by recent GAN and diffusion models has intensified concerns over the reliability of visual media. Yet, despite notable progress in deepfake detection, current forensic systems degrade sharply under real-world conditions such as severe downsampling, compression, and cross-domain distribution shifts. Moreover, most detectors operate as opaque classifiers, offering little insight into why an image is flagged as synthetic, undermining trust and hindering adoption in high-stakes settings. We introduce INSIGHT (Interpretable Neural Semantic and Image-based Generative-forensic Hallucination Tracing), a unified multimodal framework for robust detection and transparent explanation of AI-generated images, even at extremely low resolutions (16x16 - 64x64). INSIGHT combines hierarchical super-resolution for amplifying subtle forensic cues without inducing misleading artifacts, Grad-CAM driven multi-scale localization to reveal spatial regions indicative of generative patterns, and CLIP-guided semantic alignment to map visual anomalies to human-interpretable descriptors. A vision-language model is then prompted using a structured ReAct + Chain-of-Thought protocol to produce consistent, fine-grained explanations, verified through a dual-stage G-Eval + LLM-as-a-judge pipeline to minimize hallucinations and ensure factuality. Across diverse domains, including animals, vehicles, and abstract synthetic scenes, INSIGHT substantially improves both detection robustness and explanation quality under extreme degradation, outperforming prior detectors and black-box VLM baselines. Our results highlight a practical path toward transparent, reliable AI-generated image forensics and establish INSIGHT as a step forward in trustworthy multimodal content verification.


Key Contributions

  • Hierarchical super-resolution pipeline that amplifies forensic cues at extremely low resolutions (16×16–64×64) without inducing misleading artifacts
  • Grad-CAM-driven multi-scale spatial localization combined with CLIP-guided semantic alignment to map visual anomalies to human-interpretable descriptors
  • ReAct + Chain-of-Thought VLM prompting with dual-stage G-Eval + LLM-as-a-judge verification to produce factual, hallucination-minimized forensic explanations

🛡️ Threat Analysis

Output Integrity Attack

Directly contributes a novel AI-generated image detection architecture for verifying content authenticity and provenance — explicitly in ML09's scope of deepfake detection and output integrity.


Details

Domains
visionmultimodal
Model Types
vlmdiffusiongantransformer
Threat Tags
inference_timedigital
Applications
ai-generated image detectiondeepfake forensicsvisual content verification