defense 2025

Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution

Jaime Álvarez Urueña , David Camacho , Javier Huertas Tato

0 citations · 59 references · arXiv

α

Published on arXiv

2511.16541

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 91.3% detection accuracy with only 150 few-shot images per class and improves open-set source attribution AUC by 14.70% over existing approaches without retraining on new generators

SupConLoss + k-NN two-stage detector

Novel technique introduced


The rapid advancement of generative artificial intelligence has enabled the creation of synthetic images that are increasingly indistinguishable from authentic content, posing significant challenges for digital media integrity. This problem is compounded by the accelerated release cycle of novel generative models, which renders traditional detection approaches (reliant on periodic retraining) computationally infeasible and operationally impractical. This work proposes a novel two-stage detection framework designed to address the generalization challenge inherent in synthetic image detection. The first stage employs a vision deep learning model trained via supervised contrastive learning to extract discriminative embeddings from input imagery. Critically, this model was trained on a strategically partitioned subset of available generators, with specific architectures withheld from training to rigorously ablate cross-generator generalization capabilities. The second stage utilizes a k-nearest neighbors (k-NN) classifier operating on the learned embedding space, trained in a few-shot learning paradigm incorporating limited samples from previously unseen test generators. With merely 150 images per class in the few-shot learning regime, which are easily obtainable from current generation models, the proposed framework achieves an average detection accuracy of 91.3%, representing a 5.2 percentage point improvement over existing approaches . For the source attribution task, the proposed approach obtains improvements of of 14.70% and 4.27% in AUC and OSCR respectively on an open set classification context, marking a significant advancement toward robust, scalable forensic attribution systems capable of adapting to the evolving generative AI landscape without requiring exhaustive retraining protocols.


Key Contributions

  • Two-stage detection framework combining supervised contrastive learning for embedding extraction with few-shot k-NN classification for generalization to unseen generators
  • Achieves 91.3% average detection accuracy with only 150 images per class, outperforming SOTA by 5.2 percentage points without exhaustive retraining
  • Open-set source attribution approach improving AUC by 14.70% and OSCR by 4.27% for identifying the specific generative model used

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses output integrity and content provenance by detecting whether images are AI-generated and attributing them to specific generative models. The paper proposes a novel forensic detection architecture (supervised contrastive learning + k-NN) rather than applying existing methods, qualifying it as an ML09 contribution.


Details

Domains
visiongenerative
Model Types
cnntransformerdiffusiongan
Threat Tags
inference_timedigital
Datasets
ImageNetMidjourneyWukongStable Diffusion 1.4Stable Diffusion 1.5ADMVQDMBigGANGLIDE
Applications
ai-generated image detectiongenerative model attributiondigital media forensics