defense 2025

Detection of AI Generated Images Using Combined Uncertainty Measures and Particle Swarm Optimised Rejection Mechanism

Rahul Yumlembam ¹, Biju Issac ¹, Nauman Aslam ¹, Eaby Kollonoor Babu ¹, Josh Collyer ², Fraser Kennedy ²

¹ Northumbria University

² The Alan Turing Institute

1 citations · 38 references · Sci. Reports

Published on arXiv

2512.18527

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

The Combined Uncertainty measure rejects ~70% of misclassified OOD AI-generated images from unseen generators, and GP-based uncertainty alone rejects up to 80% of successful FGSM/PGD adversarial attacks on the detector.

Combined Uncertainty PSO Rejection

Novel technique introduced

As AI-generated images become increasingly photorealistic, distinguishing them from natural images poses a growing challenge. This paper presents a robust detection framework that leverages multiple uncertainty measures to decide whether to trust or reject a model's predictions. We focus on three complementary techniques: Fisher Information, which captures the sensitivity of model parameters to input variations; entropy-based uncertainty from Monte Carlo Dropout, which reflects predictive variability; and predictive variance from a Deep Kernel Learning framework using a Gaussian Process classifier. To integrate these diverse uncertainty signals, Particle Swarm Optimisation is used to learn optimal weightings and determine an adaptive rejection threshold. The model is trained on Stable Diffusion-generated images and evaluated on GLIDE, VQDM, Midjourney, BigGAN, and StyleGAN3, each introducing significant distribution shifts. While standard metrics such as prediction probability and Fisher-based measures perform well in distribution, their effectiveness degrades under shift. In contrast, the Combined Uncertainty measure consistently achieves an incorrect rejection rate of approximately 70 percent on unseen generators, successfully filtering most misclassified AI samples. Although the system occasionally rejects correct predictions from newer generators, this conservative behaviour is acceptable, as rejected samples can support retraining. The framework maintains high acceptance of accurate predictions for natural images and in-domain AI data. Under adversarial attacks using FGSM and PGD, the Combined Uncertainty method rejects around 61 percent of successful attacks, while GP-based uncertainty alone achieves up to 80 percent. Overall, the results demonstrate that multi-source uncertainty fusion provides a resilient and adaptive solution for AI-generated image detection.

Key Contributions

Unified scalar uncertainty score combining Fisher Information, MC Dropout entropy, and GP predictive variance via Deep Kernel Learning, optimally weighted by Particle Swarm Optimisation
Adaptive PSO-tuned rejection threshold that filters ~70% of misclassified OOD AI-generated samples from unseen generators (GLIDE, VQDM, Midjourney, BigGAN, StyleGAN3) while preserving accuracy on in-distribution data
Secondary adversarial robustness evaluation showing GP-based uncertainty alone rejects up to 80% of successful FGSM/PGD attacks on the detector, and the combined measure rejects ~61%

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is a novel AI-generated image detection framework — a direct output integrity and content authenticity problem. It proposes a new detection architecture (multi-uncertainty fusion via PSO) to distinguish AI-generated from real images across distribution-shifted generators, squarely fitting ML09's AI-generated content detection scope.

Details

Domains

visiongenerative

Model Types

cnntraditional_mldiffusiongan

Threat Tags

inference_timewhite_box

Datasets

Stable DiffusionGLIDEVQDMMidjourneyBigGANStyleGAN3

Applications

ai-generated image detectiondeepfake detectionout-of-distribution image filtering

Read PDF arXiv DOI

Detection of AI Generated Images Using Combined Uncertainty Measures and Particle Swarm Optimised Rejection Mechanism

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces

Towards Imperceptible Adversarial Defense: A Gradient-Driven Shield against Facial Manipulations

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

Self-Supervised Learning for Detecting AI-Generated Faces as Anomalies

A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries

Decoder Gradient Shields: A Family of Provable and High-Fidelity Methods Against Gradient-Based Box-Free Watermark Removal

Layer Consistency Matters: Elegant Latent Transition Discrepancy for Generalizable Synthetic Image Detection

LiteUpdate: A Lightweight Framework for Updating AI-Generated Image Detectors