defense 2026

Degradation-Consistent Paired Training for Robust AI-Generated Image Detection

Zongyou Yang 1, Yinghan Hou 2, Xiaokun Yang 3

0 citations

α

Published on arXiv

2604.10102

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves +9.1 percentage points degraded-condition average accuracy improvement across 9 generators and 8 degradation conditions, with most dramatic gains under JPEG compression (+15.7% to +17.9%), at a cost of only 0.9% clean accuracy

DCPT

Novel technique introduced


AI-generated image detectors suffer significant performance degradation under real-world image corruptions such as JPEG compression, Gaussian blur, and resolution downsampling. We observe that state-of-the-art methods, including B-Free, treat degradation robustness as a byproduct of data augmentation rather than an explicit training objective. In this work, we propose Degradation-Consistent Paired Training (DCPT), a simple yet effective training strategy that explicitly enforces robustness through paired consistency constraints. For each training image, we construct a clean view and a degraded view, then impose two constraints: a feature consistency loss that minimizes the cosine distance between clean and degraded representations, and a prediction consistency loss based on symmetric KL divergence that aligns output distributions across views. DCPT adds zero additional parameters and zero inference overhead. Experiments on the Synthbuster benchmark (9 generators, 8 degradation conditions) demonstrate that DCPT improves the degraded-condition average accuracy by 9.1 percentage points compared to an identical baseline without paired training, while sacrificing only 0.9% clean accuracy. The improvement is most pronounced under JPEG compression (+15.7% to +17.9%). Ablation further reveals that adding architectural components leads to overfitting on limited training data, confirming that training objective improvement is more effective than architectural augmentation for degradation robustness.


Key Contributions

  • Degradation-Consistent Paired Training (DCPT) method with zero additional parameters or inference overhead
  • Explicit consistency constraints via feature-level cosine loss and prediction-level symmetric KL divergence between clean and degraded image views
  • Empirical validation showing +9.1% degraded-condition average accuracy improvement on Synthbuster benchmark, with +15.7% to +17.9% gains under JPEG compression

🛡️ Threat Analysis

Output Integrity Attack

Defends AI-generated content detection systems (deepfake detectors) against degradation-based evasion. The paper explicitly addresses the integrity and reliability of AI-generated image detection under real-world corruptions like JPEG compression, blur, and downsampling. This is output integrity — ensuring detection systems maintain accuracy when content has been degraded.


Details

Domains
visiongenerative
Model Types
transformerdiffusiongan
Threat Tags
inference_timedigital
Datasets
Synthbuster
Applications
ai-generated image detectiondeepfake detectionsynthetic media forensics