defense 2026

ForensicFormer: Hierarchical Multi-Scale Reasoning for Cross-Domain Image Forgery Detection

Hema Hariharan Samson

0 citations · 54 references · arXiv

α

Published on arXiv

2601.08873

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 86.8% average accuracy across 7 diverse test sets and 0.76 F1-score for pixel-level forgery localization, outperforming prior universal detectors by >11% on out-of-distribution data.

ForensicFormer

Novel technique introduced


The proliferation of AI-generated imagery and sophisticated editing tools has rendered traditional forensic methods ineffective for cross-domain forgery detection. We present ForensicFormer, a hierarchical multi-scale framework that unifies low-level artifact detection, mid-level boundary analysis, and high-level semantic reasoning via cross-attention transformers. Unlike prior single-paradigm approaches, which achieve <75% accuracy on out-of-distribution datasets, our method maintains 86.8% average accuracy across seven diverse test sets, spanning traditional manipulations, GAN-generated images, and diffusion model outputs - a significant improvement over state-of-the-art universal detectors. We demonstrate superior robustness to JPEG compression (83% accuracy at Q=70 vs. 66% for baselines) and provide pixel-level forgery localization with a 0.76 F1-score. Extensive ablation studies validate that each hierarchical component contributes 4-10% accuracy improvement, and qualitative analysis reveals interpretable forensic features aligned with human expert reasoning. Our work bridges classical image forensics and modern deep learning, offering a practical solution for real-world deployment where manipulation techniques are unknown a priori.


Key Contributions

  • Hierarchical multi-scale framework unifying low-level DCT/DWT artifact analysis, mid-level boundary inconsistency detection, and high-level semantic coherence reasoning via cross-attention transformers
  • Multi-task training jointly optimizing binary classification, pixel-level forgery mask prediction, and manipulation-type classification to force spatially-grounded cross-domain forensic features
  • 86.8% average accuracy across 7 diverse out-of-distribution test sets (traditional manipulations, GAN, and diffusion outputs) with 83% accuracy under JPEG Q=70 compression versus 66% for CNN baselines

🛡️ Threat Analysis

Output Integrity Attack

ForensicFormer is a novel AI-generated content detection architecture targeting GAN outputs, diffusion model images, and traditional image manipulations — directly addressing output integrity and authenticity verification of AI-generated imagery.


Details

Domains
vision
Model Types
transformercnn
Threat Tags
inference_timedigital
Datasets
CASIA2NIST16DEFACTOForenSynthsDiffusionDBRAISE
Applications
image forgery detectionai-generated image detectiondeepfake detection