On the Holistic Approach for Detecting Human Image Forgery

The rapid advancement of AI-generated content (AIGC) has escalated the threat of deepfakes, from facial manipulations to the synthesis of entire photorealistic human bodies. However, existing detection methods remain fragmented, specializing either in facial-region forgeries or full-body synthetic images, and consequently fail to generalize across the full spectrum of human image manipulations. We introduce HuForDet, a holistic framework for human image forgery detection, which features a dual-branch architecture comprising: (1) a face forgery detection branch that employs heterogeneous experts operating in both RGB and frequency domains, including an adaptive Laplacian-of-Gaussian (LoG) module designed to capture artifacts ranging from fine-grained blending boundaries to coarse-scale texture irregularities; and (2) a contextualized forgery detection branch that leverages a Multi-Modal Large Language Model (MLLM) to analyze full-body semantic consistency, enhanced with a confidence estimation mechanism that dynamically weights its contribution during feature fusion. We curate a human image forgery (HuFor) dataset that unifies existing face forgery data with a new corpus of full-body synthetic humans. Extensive experiments show that our HuForDet achieves state-of-the-art forgery detection performance and superior robustness across diverse human image forgeries.

Key Contributions

Dual-branch HuForDet architecture combining a frequency-domain face forgery detection branch (with adaptive Laplacian-of-Gaussian module) and an MLLM-based contextualized branch with confidence estimation for full-body synthetic image detection
HuFor dataset that unifies existing face forgery benchmarks with a new corpus of full-body photorealistic synthetic humans, enabling holistic evaluation
Confidence-weighted feature fusion mechanism that dynamically balances MLLM semantic consistency scores with low-level artifact signals

🛡️ Threat Analysis

Output Integrity Attack

The core contribution is detecting AI-generated content — specifically face forgeries and full-body synthetic human images (deepfakes). The paper proposes a novel detection architecture with heterogeneous frequency-domain experts and MLLM-based semantic consistency analysis, which directly addresses output integrity and AI-generated content detection.

Details

Domains

visionmultimodal

Model Types

vlmtransformercnn

Threat Tags

inference_time

Datasets

HuFor (curated)

Applications

2025 0 cit.

Output Integrity Attack

92%