defense 2025

How Noise Benefits AI-generated Image Detection

Jiazhen Yan 1, Ziqiang Li 1, Fan Wang 2, Kai Zeng 3, Zhangjie Fu 1

0 citations · 83 references · arXiv

α

Published on arXiv

2511.16136

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PiN-CLIP achieves +5.4% average accuracy improvement over existing methods on an open-world dataset of images from 42 distinct generative models

PiN-CLIP

Novel technique introduced


The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, out-of-distribution generalization remains a persistent challenge. We trace this weakness to spurious shortcuts exploited during training and we also observe that small feature-space perturbations can mitigate shortcut dominance. To address this problem in a more controllable manner, we propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle. Specifically, we construct positive-incentive noise in the feature space via cross-attention fusion of visual and categorical semantic features. During optimization, the noise is injected into the feature space to fine-tune the visual encoder, suppressing shortcut-sensitive directions while amplifying stable forensic cues, thereby enabling the extraction of more robust and generalized artifact representations. Comparative experiments are conducted on an open-world dataset comprising synthetic images generated by 42 distinct generative models. Our method achieves new state-of-the-art performance, with notable improvements of 5.4 in average accuracy over existing approaches.


Key Contributions

  • Identifies spurious shortcut learning as the root cause of poor OOD generalization in AI-generated image detectors
  • Proposes PiN-CLIP, which jointly trains a noise generator and detection network under a variational positive-incentive noise principle, injecting semantically aligned feature-space perturbations to suppress shortcut-sensitive directions
  • Achieves state-of-the-art performance with +5.4% average accuracy on an open-world benchmark spanning 42 distinct generative models

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated image detection — a core ML09 concern. The paper's primary contribution is a novel detection architecture (PiN-CLIP) that improves out-of-distribution generalization by injecting positive-incentive noise to suppress shortcut learning and amplify stable forensic cues, enabling more reliable authentication of real vs. synthetic image content.


Details

Domains
visiongenerative
Model Types
transformerdiffusiongan
Threat Tags
inference_timeblack_box
Datasets
Open-world AIGI benchmark (42 generative models)
Applications
ai-generated image detectiondigital forensicssynthetic image forensics