Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated

Despite being trained on balanced datasets, existing AI-generated image detectors often exhibit systematic bias at test time, frequently misclassifying fake images as real. We hypothesize that this behavior stems from distributional shift in fake samples and implicit priors learned during training. Specifically, models tend to overfit to superficial artifacts that do not generalize well across different generation methods, leading to a misaligned decision threshold when faced with test-time distribution shift. To address this, we propose a theoretically grounded post-hoc calibration framework based on Bayesian decision theory. In particular, we introduce a learnable scalar correction to the model's logits, optimized on a small validation set from the target distribution while keeping the backbone frozen. This parametric adjustment compensates for distributional shift in model output, realigning the decision boundary even without requiring ground-truth labels. Experiments on challenging benchmarks show that our approach significantly improves robustness without retraining, offering a lightweight and principled solution for reliable and adaptive AI-generated image detection in the open world. Code is available at https://github.com/muliyangm/AIGI-Det-Calib.

Key Contributions

Identification of systematic threshold misalignment in AI-generated image detectors caused by distributional shift in fake samples and implicit training priors
Theoretically grounded post-hoc calibration framework using Bayesian decision theory with a learnable scalar logit correction optimized on a small unlabeled validation set
Demonstrated significant robustness improvements on challenging benchmarks without backbone retraining, enabling lightweight deployment adaptation

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated image detection (synthetic content authenticity), proposing a novel calibration method to improve detectors' output integrity under test-time distributional shift — squarely within the content provenance and AI-generated content detection sub-domain of ML09.