defense 2025

Towards Imperceptible Adversarial Defense: A Gradient-Driven Shield against Facial Manipulations

Yue Li 1, Linying Xue 1, Dongdong Lin 1, Qiushi Li 2, Hui Tian 1, Hongxia Wang 3

0 citations · 39 references · arXiv

α

Published on arXiv

2510.01699

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

GRASP achieves PSNR > 40 dB, SSIM = 0.99, and 100% defense success rate against facial attribute manipulations, significantly outperforming existing proactive defense approaches in visual quality.

GRASP

Novel technique introduced


With the flourishing prosperity of generative models, manipulated facial images have become increasingly accessible, raising concerns regarding privacy infringement and societal trust. In response, proactive defense strategies embed adversarial perturbations into facial images to counter deepfake manipulation. However, existing methods often face a tradeoff between imperceptibility and defense effectiveness-strong perturbations may disrupt forgeries but degrade visual fidelity. Recent studies have attempted to address this issue by introducing additional visual loss constraints, yet often overlook the underlying gradient conflicts among losses, ultimately weakening defense performance. To bridge the gap, we propose a gradient-projection-based adversarial proactive defense (GRASP) method that effectively counters facial deepfakes while minimizing perceptual degradation. GRASP is the first approach to successfully integrate both structural similarity loss and low-frequency loss to enhance perturbation imperceptibility. By analyzing gradient conflicts between defense effectiveness loss and visual quality losses, GRASP pioneers the design of the gradient-projection mechanism to mitigate these conflicts, enabling balanced optimization that preserves image fidelity without sacrificing defensive performance. Extensive experiments validate the efficacy of GRASP, achieving a PSNR exceeding 40 dB, SSIM of 0.99, and a 100% defense success rate against facial attribute manipulations, significantly outperforming existing approaches in visual quality.


Key Contributions

  • First proactive deepfake defense to successfully integrate both structural similarity (SSIM) loss and low-frequency loss for perturbation imperceptibility
  • Gradient-projection mechanism that resolves conflicts between defense effectiveness loss and visual quality losses, enabling balanced optimization
  • Achieves PSNR > 40 dB and SSIM = 0.99 with 100% defense success rate against facial attribute manipulations, outperforming prior methods

🛡️ Threat Analysis

Output Integrity Attack

Proposes proactive adversarial perturbations embedded in facial images to prevent deepfake manipulation — directly addresses content integrity and authenticity protection. The OWASP guidelines explicitly cite anti-deepfake perturbations as an ML09 artifact, confirming that papers proposing such protections fall under output integrity defense.


Details

Domains
visiongenerative
Model Types
gandiffusioncnn
Threat Tags
white_boxdigitalinference_time
Datasets
CelebA-HQ
Applications
facial deepfake defenseface attribute manipulationfacial privacy protection