Degradation-Consistent Paired Training for Robust AI-Generated Image Detection

AI-generated image detectors suffer significant performance degradation under real-world image corruptions such as JPEG compression, Gaussian blur, and resolution downsampling. We observe that state-of-the-art methods, including B-Free, treat degradation robustness as a byproduct of data augmentation rather than an explicit training objective. In this work, we propose Degradation-Consistent Paired Training (DCPT), a simple yet effective training strategy that explicitly enforces robustness through paired consistency constraints. For each training image, we construct a clean view and a degraded view, then impose two constraints: a feature consistency loss that minimizes the cosine distance between clean and degraded representations, and a prediction consistency loss based on symmetric KL divergence that aligns output distributions across views. DCPT adds zero additional parameters and zero inference overhead. Experiments on the Synthbuster benchmark (9 generators, 8 degradation conditions) demonstrate that DCPT improves the degraded-condition average accuracy by 9.1 percentage points compared to an identical baseline without paired training, while sacrificing only 0.9% clean accuracy. The improvement is most pronounced under JPEG compression (+15.7% to +17.9%). Ablation further reveals that adding architectural components leads to overfitting on limited training data, confirming that training objective improvement is more effective than architectural augmentation for degradation robustness.

Key Contributions

Degradation-Consistent Paired Training (DCPT) method with zero additional parameters or inference overhead
Explicit consistency constraints via feature-level cosine loss and prediction-level symmetric KL divergence between clean and degraded image views
Empirical validation showing +9.1% degraded-condition average accuracy improvement on Synthbuster benchmark, with +15.7% to +17.9% gains under JPEG compression

🛡️ Threat Analysis

Output Integrity Attack

Defends AI-generated content detection systems (deepfake detectors) against degradation-based evasion. The paper explicitly addresses the integrity and reliability of AI-generated image detection under real-world corruptions like JPEG compression, blur, and downsampling. This is output integrity — ensuring detection systems maintain accuracy when content has been degraded.

Details

Domains

visiongenerative

Model Types

transformerdiffusiongan

Threat Tags

inference_timedigital

Datasets

Synthbuster

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

TrueMoE: Dual-Routing Mixture of Discriminative Experts for Synthetic Image Detection

Rethinking Cross-Generator Image Forgery Detection through DINOv3

Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution

Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers

SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images

ZK-WAGON: Imperceptible Watermark for Image Generation Models using ZK-SNARKs

Training-free Detection of AI-generated images via Cropping Robustness