ForensicFormer: Hierarchical Multi-Scale Reasoning for Cross-Domain Image Forgery Detection

The proliferation of AI-generated imagery and sophisticated editing tools has rendered traditional forensic methods ineffective for cross-domain forgery detection. We present ForensicFormer, a hierarchical multi-scale framework that unifies low-level artifact detection, mid-level boundary analysis, and high-level semantic reasoning via cross-attention transformers. Unlike prior single-paradigm approaches, which achieve <75% accuracy on out-of-distribution datasets, our method maintains 86.8% average accuracy across seven diverse test sets, spanning traditional manipulations, GAN-generated images, and diffusion model outputs - a significant improvement over state-of-the-art universal detectors. We demonstrate superior robustness to JPEG compression (83% accuracy at Q=70 vs. 66% for baselines) and provide pixel-level forgery localization with a 0.76 F1-score. Extensive ablation studies validate that each hierarchical component contributes 4-10% accuracy improvement, and qualitative analysis reveals interpretable forensic features aligned with human expert reasoning. Our work bridges classical image forensics and modern deep learning, offering a practical solution for real-world deployment where manipulation techniques are unknown a priori.

Key Contributions

Hierarchical multi-scale framework unifying low-level DCT/DWT artifact analysis, mid-level boundary inconsistency detection, and high-level semantic coherence reasoning via cross-attention transformers
Multi-task training jointly optimizing binary classification, pixel-level forgery mask prediction, and manipulation-type classification to force spatially-grounded cross-domain forensic features
86.8% average accuracy across 7 diverse out-of-distribution test sets (traditional manipulations, GAN, and diffusion outputs) with 83% accuracy under JPEG Q=70 compression versus 66% for CNN baselines

🛡️ Threat Analysis

Output Integrity Attack

ForensicFormer is a novel AI-generated content detection architecture targeting GAN outputs, diffusion model images, and traditional image manipulations — directly addressing output integrity and authenticity verification of AI-generated imagery.

Details

Domains

vision

Model Types

transformercnn

Threat Tags

inference_timedigital

Datasets

CASIA2NIST16DEFACTOForenSynthsDiffusionDBRAISE

Applications

2026 1 cit.

Output Integrity Attack

100%