defense arXiv Nov 17, 2025 · Nov 2025
Jiazhen Yan, Ziqiang Li, Fan Wang et al. · Nanjing University of Information Science and Technology · University of Macau
Novel gradient surgery framework fine-tunes CLIP for AI-generated image detection while preventing catastrophic forgetting
Output Integrity Attack visionmultimodal
The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.
transformer vlm gan diffusion Nanjing University of Information Science and Technology · University of Macau
defense arXiv Nov 20, 2025 · Nov 2025
Jiazhen Yan, Ziqiang Li, Fan Wang et al. · Nanjing University of Information Science and Technology · University of Macau +1 more
Proposes PiN-CLIP, a noise-guided CLIP fine-tuning method that suppresses spurious shortcuts for generalizable AI-generated image detection
Output Integrity Attack visiongenerative
The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, out-of-distribution generalization remains a persistent challenge. We trace this weakness to spurious shortcuts exploited during training and we also observe that small feature-space perturbations can mitigate shortcut dominance. To address this problem in a more controllable manner, we propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle. Specifically, we construct positive-incentive noise in the feature space via cross-attention fusion of visual and categorical semantic features. During optimization, the noise is injected into the feature space to fine-tune the visual encoder, suppressing shortcut-sensitive directions while amplifying stable forensic cues, thereby enabling the extraction of more robust and generalized artifact representations. Comparative experiments are conducted on an open-world dataset comprising synthetic images generated by 42 distinct generative models. Our method achieves new state-of-the-art performance, with notable improvements of 5.4 in average accuracy over existing approaches.
transformer diffusion gan Nanjing University of Information Science and Technology · University of Macau · University of Siena