Color Matters: Demosaicing-Guided Color Correlation Training for Generalizable AI-Generated Image Detection

As realistic AI-generated images threaten digital authenticity, we address the generalization failure of generative artifact-based detectors by exploiting the intrinsic properties of the camera imaging pipeline. Concretely, we investigate color correlations induced by the color filter array (CFA) and demosaicing, and propose a Demosaicing-guided Color Correlation Training (DCCT) framework for AI-generated image detection. By simulating the CFA sampling pattern, we decompose each color image into a single-channel input (as the condition) and the remaining two channels as the ground-truth targets (for prediction). A self-supervised U-Net is trained to model the conditional distribution of the missing channels from the given one, parameterized via a mixture of logistic functions. Our theoretical analysis reveals that DCCT targets a provable distributional difference in color-correlation features between photographic and AI-generated images. By leveraging these distinct features to construct a binary classifier, DCCT achieves state-of-the-art generalization and robustness, significantly outperforming prior methods across over 20 unseen generators.

Key Contributions

DCCT framework that self-supervisedly trains a U-Net to model conditional inter-channel color distributions induced by camera CFA demosaicing, then uses learned features for AI-generated image detection
Theoretical analysis proving a non-vanishing 1-Wasserstein distance lower bound between photographic and AI-generated images in the CFA-derived high-frequency color-correlation feature space
State-of-the-art generalization and robustness across 20+ unseen generators, outperforming both artifact-based and vision-language representation-based baselines

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated image detection method — detecting synthetic visual content is core Output Integrity (ML09). The paper introduces a forensic technique exploiting camera-intrinsic color correlations as a physically-grounded discriminative cue between real and AI-generated images.