defense 2025

Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach

Mohd Ruhul Ameen , Akif Islam

0 citations · 21 references · arXiv

α

Published on arXiv

2511.00352

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 0.993 AUROC under stratified five-fold cross-validation and 0.990 on a holdout split using only logistic regression on 15-dimensional diffusion snap-back features

Diffusion Snap-Back

Novel technique introduced


The rapid rise of generative diffusion models has made distinguishing authentic visual content from synthetic imagery increasingly challenging. Traditional deepfake detection methods, which rely on frequency or pixel-level artifacts, fail against modern text-to-image systems such as Stable Diffusion and DALL-E that produce photorealistic and artifact-free results. This paper introduces a diffusion-based forensic framework that leverages multi-strength image reconstruction dynamics, termed diffusion snap-back, to identify AI-generated images. By analysing how reconstruction metrics (LPIPS, SSIM, and PSNR) evolve across varying noise strengths, we extract interpretable manifold-based features that differentiate real and synthetic images. Evaluated on a balanced dataset of 4,000 images, our approach achieves 0.993 AUROC under cross-validation and remains robust to common distortions such as compression and noise. Despite using limited data and a single diffusion backbone (Stable Diffusion v1.5), the proposed method demonstrates strong generalization and interpretability, offering a foundation for scalable, model-agnostic synthetic media forensics.


Key Contributions

  • Diffusion snap-back framework: uses a pre-trained diffusion img2img pipeline as a forensic probe by analyzing how reconstruction quality metrics (LPIPS, SSIM, PSNR) evolve across multiple noise strengths
  • Compact 15-dimensional manifold-aligned feature vector combining multi-strength perceptual metrics with trajectory descriptors (AUC-LPIPS, delta-LP, knee-step) for interpretable classification
  • Lightweight logistic regression classifier achieving 0.993 AUROC on 4,000 images with demonstrated robustness to JPEG compression and additive noise

🛡️ Threat Analysis

Output Integrity Attack

Introduces a novel AI-generated image detection framework — deepfake/synthetic image detection is a core ML09 use case (output integrity and content provenance). The paper's primary contribution is a new forensic technique, not merely applying existing detectors to a domain.


Details

Domains
visiongenerative
Model Types
diffusiontraditional_ml
Threat Tags
inference_timedigital
Datasets
Custom balanced dataset (4,000 images: 2,000 real, 2,000 AI-generated via Stable Diffusion/DALL-E)
Applications
ai-generated image detectiondeepfake detectionsynthetic media forensics