defense 2026

Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Jipeng Liu 1, Haichao Shi 1, Siyu Xing 1, Rong Yin 2, Xiao-Yu Zhang 1

0 citations

α

Published on arXiv

2603.24057

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art generalization across cross-domain and universal forgery benchmarks by mitigating optimization collapse in VLM-based detectors

CoRIT

Novel technique introduced


While Vision-Language Models (VLMs) like CLIP have emerged as a dominant paradigm for generalizable deepfake detection, a representational disconnect remains: their semantic-centric pre-training is ill-suited for capturing non-semantic artifacts inherent to hyper-realistic synthesis. In this work, we identify a failure mode termed Optimization Collapse, where detectors trained with Sharpness-Aware Minimization (SAM) degenerate to random guessing on non-semantic forgeries once the perturbation radius exceeds a narrow threshold. To theoretically formalize this collapse, we propose the Critical Optimization Radius (COR) to quantify the geometric stability of the optimization landscape, and leverage the Gradient Signal-to-Noise Ratio (GSNR) to measure generalization potential. We establish a theorem proving that COR increases monotonically with GSNR, thereby revealing that the geometric instability of SAM optimization originates from degraded intrinsic generalization potential. This result identifies the layer-wise attenuation of GSNR as the root cause of Optimization Collapse in detecting non-semantic forgeries. Although naively reducing perturbation radius yields stable convergence under SAM, it merely treats the symptom without mitigating the intrinsic generalization degradation, necessitating enhanced gradient fidelity. Building on this insight, we propose the Contrastive Regional Injection Transformer (CoRIT), which integrates a computationally efficient Contrastive Gradient Proxy (CGP) with three training-free strategies: Region Refinement Mask to suppress CGP variance, Regional Signal Injection to preserve CGP magnitude, and Hierarchical Representation Integration to attain more generalizable representations. Extensive experiments demonstrate that CoRIT mitigates optimization collapse and achieves state-of-the-art generalization across cross-domain and universal forgery benchmarks.


Key Contributions

  • Identifies and formalizes 'Optimization Collapse' phenomenon in SAM-trained deepfake detectors when detecting non-semantic forgeries
  • Proposes Critical Optimization Radius (COR) and establishes theoretical relationship with Gradient Signal-to-Noise Ratio (GSNR) to explain geometric instability
  • Introduces CoRIT (Contrastive Regional Injection Transformer) with Contrastive Gradient Proxy and three training-free strategies to mitigate optimization collapse and improve cross-domain generalization

🛡️ Threat Analysis

Output Integrity Attack

Focuses on detecting AI-generated visual content (deepfakes) and verifying content authenticity — this is output integrity and AI-generated content detection. The paper addresses generalization of deepfake detection across domains and forgery types.


Details

Domains
visionmultimodal
Model Types
vlmtransformermultimodal
Threat Tags
inference_time
Applications
deepfake detectionvisual forensicssynthetic image detection