Degradation-Consistent Paired Training for Robust AI-Generated Image Detection
Zongyou Yang 1, Yinghan Hou 2, Xiaokun Yang 3
Published on arXiv
2604.10102
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves +9.1 percentage points degraded-condition average accuracy improvement across 9 generators and 8 degradation conditions, with most dramatic gains under JPEG compression (+15.7% to +17.9%), at a cost of only 0.9% clean accuracy
DCPT
Novel technique introduced
AI-generated image detectors suffer significant performance degradation under real-world image corruptions such as JPEG compression, Gaussian blur, and resolution downsampling. We observe that state-of-the-art methods, including B-Free, treat degradation robustness as a byproduct of data augmentation rather than an explicit training objective. In this work, we propose Degradation-Consistent Paired Training (DCPT), a simple yet effective training strategy that explicitly enforces robustness through paired consistency constraints. For each training image, we construct a clean view and a degraded view, then impose two constraints: a feature consistency loss that minimizes the cosine distance between clean and degraded representations, and a prediction consistency loss based on symmetric KL divergence that aligns output distributions across views. DCPT adds zero additional parameters and zero inference overhead. Experiments on the Synthbuster benchmark (9 generators, 8 degradation conditions) demonstrate that DCPT improves the degraded-condition average accuracy by 9.1 percentage points compared to an identical baseline without paired training, while sacrificing only 0.9% clean accuracy. The improvement is most pronounced under JPEG compression (+15.7% to +17.9%). Ablation further reveals that adding architectural components leads to overfitting on limited training data, confirming that training objective improvement is more effective than architectural augmentation for degradation robustness.
Key Contributions
- Degradation-Consistent Paired Training (DCPT) method with zero additional parameters or inference overhead
- Explicit consistency constraints via feature-level cosine loss and prediction-level symmetric KL divergence between clean and degraded image views
- Empirical validation showing +9.1% degraded-condition average accuracy improvement on Synthbuster benchmark, with +15.7% to +17.9% gains under JPEG compression
🛡️ Threat Analysis
Defends AI-generated content detection systems (deepfake detectors) against degradation-based evasion. The paper explicitly addresses the integrity and reliability of AI-generated image detection under real-world corruptions like JPEG compression, blur, and downsampling. This is output integrity — ensuring detection systems maintain accuracy when content has been degraded.