Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

AI-generated images are becoming increasingly realistic and diverse, posing significant challenges for generalizable detection. While Vision Foundation Models (VFMs) provide rich semantic representations and frequency-based methods capture complementary artifact cues, existing approaches that combine these modalities still suffer from limited generalization, with notable performance degradation on unseen generative models. We attribute this limitation to two key factors: frequency shortcut bias toward easily distinguishable cues associated with specific generators and cross-domain representation conflict between high-level semantics and low-level frequency patterns. To address these issues, we propose a Frequency-aware Gated Injection Network (FGINet) to improve generalization. Specifically, we design a Band-Masked Frequency Encoder (BMFE) that applies cross-band masking in the frequency domain to reduce reliance on generator-specific patterns and encourage more diverse and generalizable representations. We further introduce a Layer-wise Gated Frequency Injection (LGFI) mechanism to progressively inject frequency cues into the VFM backbone with adaptive gating, aligning with its hierarchical abstraction and alleviating representation conflict. Moreover, we propose a Hyperspherical Compactness Learning (HCL) framework with a cosine margin objective to learn compact and well-separated representations. Extensive experiments demonstrate that FGINet achieves state-of-the-art performance and strong generalization across multiple challenging datasets.

Key Contributions

Band-Masked Frequency Encoder (BMFE) that applies cross-band masking to reduce reliance on generator-specific frequency shortcuts
Layer-wise Gated Frequency Injection (LGFI) mechanism to progressively fuse frequency cues with VFM semantic features while alleviating representation conflict
Hyperspherical Compactness Learning (HCL) framework with cosine margin objective for compact and well-separated representations

🛡️ Threat Analysis

Output Integrity Attack

Paper proposes a detection method for AI-generated images (deepfake/synthetic image detection), which is a core output integrity and content authenticity problem. The goal is to verify whether images are real or AI-generated, directly addressing content provenance and authentication.

Details

Domains

visiongenerative

Model Types

diffusiongantransformer

Threat Tags

inference_time

Applications

2026 0 cit.

Output Integrity Attack

100%

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Detecting AI-Generated Images via Distributional Deviations from Real Images

Semantic-Aware Reconstruction Error for Detecting AI-Generated Images

Detecting Generated Images by Fitting Natural Image Distributions

NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection

Generalizable and Adaptive Continual Learning Framework for AI-generated Image Detection

HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild

Training-free Detection of AI-generated images via Cropping Robustness

Aggregating Diverse Cue Experts for AI-Generated Image Detection