defense arXiv Jan 28, 2026 · 9w ago
Wenbo Xu, Wei Lu, Xiangyang Luo et al. · Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing +1 more
Proposes VLM-based deepfake detector using RLHF and multimodal alignment rewards for explainable forgery reasoning and spatial localization
Output Integrity Attack visionmultimodal
Deepfake detection is a widely researched topic that is crucial for combating the spread of malicious content, with existing methods mainly modeling the problem as classification or spatial localization. The rapid advancements in generative models impose new demands on Deepfake detection. In this paper, we propose multimodal alignment and reinforcement for explainable Deepfake detection via vision-language models, termed MARE, which aims to enhance the accuracy and reliability of Vision-Language Models (VLMs) in Deepfake detection and reasoning. Specifically, MARE designs comprehensive reward functions, incorporating reinforcement learning from human feedback (RLHF), to incentivize the generation of text-spatially aligned reasoning content that adheres to human preferences. Besides, MARE introduces a forgery disentanglement module to capture intrinsic forgery traces from high-level facial semantics, thereby improving its authenticity detection capability. We conduct thorough evaluations on the reasoning content generated by MARE. Both quantitative and qualitative experimental results demonstrate that MARE achieves state-of-the-art performance in terms of accuracy and reliability.
vlm transformer Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing · University of Macau
defense arXiv Jan 21, 2026 · 10w ago
Liqin Wang, Qianyue Hu, Wei Lu et al. · Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing
Adversarial perturbations that cascade-disrupt diffusion face-swapping pipelines by corrupting identity extraction and injection to prevent deepfakes
Input Manipulation Attack Output Integrity Attack visiongenerative
The rapid evolution of diffusion models has democratized face swapping but also raises concerns about privacy and identity security. Existing proactive defenses, often adapted from image editing attacks, prove ineffective in this context. We attribute this failure to an oversight of the structural resilience and the unique static conditional guidance mechanism inherent in face swapping systems. To address this, we propose VoidFace, a systemic defense method that views face swapping as a coupled identity pathway. By injecting perturbations at critical bottlenecks, VoidFace induces cascading disruption throughout the pipeline. Specifically, we first introduce localization disruption and identity erasure to degrade physical regression and semantic embeddings, thereby impairing the accurate modeling of the source face. We then intervene in the generative domain by decoupling attention mechanisms to sever identity injection, and corrupting intermediate diffusion features to prevent the reconstruction of source identity. To ensure visual imperceptibility, we perform adversarial search in the latent manifold, guided by a perceptual adaptive strategy to balance attack potency with image quality. Extensive experiments show that VoidFace outperforms existing defenses across various diffusion-based swapping models, while producing adversarial faces with superior visual quality.
diffusion Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing
defense arXiv Jan 29, 2026 · 9w ago
Lingxiao Chen, Liqin Wang, Wei Lu et al. · Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing
Fingerprints diffusion models via denoising trajectory manifolds to verify copyright in black-box API settings without model modification
Model Theft visiongenerative
The exceptional performance of diffusion models establishes them as high-value intellectual property but exposes them to unauthorized replication. Existing protection methods either modify the model to embed watermarks, which impairs performance, or extract model fingerprints by manipulating the denoising process, rendering them incompatible with black-box APIs. In this paper, we propose TrajPrint, a completely lossless and training-free framework that verifies model copyright by extracting unique manifold fingerprints formed during deterministic generation. Specifically, we first utilize a watermarked image as an anchor and exactly trace the path back to its trajectory origin, effectively locking the model fingerprint mapped by this path. Subsequently, we implement a joint optimization strategy that employs dual-end anchoring to synthesize a specific fingerprint noise, which strictly adheres to the target manifold for robust watermark recovery. As input, it enables the protected target model to recover the watermarked image, while failing on non-target models. Finally, we achieved verification via atomic inference and statistical hypothesis testing. Extensive experiments demonstrate that TrajPrint achieves lossless verification in black-box API scenarios with superior robustness against model modifications.
diffusion Sun Yat-Sen University · State Key Laboratory of Mathematical Engineering and Advanced Computing
defense arXiv Jan 29, 2026 · 9w ago
Midou Guo, Qilin Yin, Wei Lu et al. · Sun Yat-Sen University · Alibaba Group +1 more
Weakly supervised deepfake temporal localization using MAE reconstruction errors and asymmetric contrastive loss on multimodal video
Output Integrity Attack visionaudiomultimodal
Modern deepfakes have evolved into localized and intermittent manipulations that require fine-grained temporal localization. The prohibitive cost of frame-level annotation makes weakly supervised methods a practical necessity, which rely only on video-level labels. To this end, we propose Reconstruction-based Temporal Deepfake Localization (RT-DeepLoc), a weakly supervised temporal forgery localization framework that identifies forgeries via reconstruction errors. Our framework uses a Masked Autoencoder (MAE) trained exclusively on authentic data to learn its intrinsic spatiotemporal patterns; this allows the model to produce significant reconstruction discrepancies for forged segments, effectively providing the missing fine-grained cues for localization. To robustly leverage these indicators, we introduce a novel Asymmetric Intra-video Contrastive Loss (AICL). By focusing on the compactness of authentic features guided by these reconstruction cues, AICL establishes a stable decision boundary that enhances local discrimination while preserving generalization to unseen forgeries. Extensive experiments on large-scale datasets, including LAV-DF, demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly-supervised temporal forgery localization.
transformer multimodal Sun Yat-Sen University · Alibaba Group · State Key Laboratory of Mathematical Engineering and Advanced Computing