defense 2025

DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection

Jiazhen Yan 1, Ziqiang Li 1, Fan Wang 2, Boyu Wang 1, Ziwen He 1, Zhangjie Fu 1

0 citations · 102 references · arXiv

α

Published on arXiv

2511.13108

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DGS-Net outperforms state-of-the-art detectors by an average margin of 6.6% across 50 generative models while preserving cross-domain generalization.

DGS-Net (Distillation-Guided Gradient Surgery Network)

Novel technique introduced


The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.


Key Contributions

  • Gradient-space decomposition that separates harmful (forgetting) and beneficial (prior-preserving) gradient directions during CLIP fine-tuning
  • Distillation-guided alignment with a frozen CLIP encoder to preserve transferable pre-trained priors while improving real/fake separability
  • Outperforms state-of-the-art AI-generated image detectors by an average of 6.6% across 50 diverse generative models

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel AI-generated image detection architecture that identifies synthetic images from GANs and diffusion models — directly addressing output integrity and content authenticity, a core ML09 concern.


Details

Domains
visionmultimodal
Model Types
transformervlmgandiffusion
Threat Tags
inference_time
Datasets
ProGANR3GANSDXLSimSwap
Applications
ai-generated image detectionsynthetic image detectiondeepfake detection