defense 2025

Towards Robust Protective Perturbation against DeepFake Face Swapping

Hengyang Yao 1, Lin Li 2, Ke Sun 3, Jianing Qiu 4, Huiping Chen 1

0 citations · 46 references · arXiv

α

Published on arXiv

2512.07228

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

EOLT achieves 26% higher average robustness than state-of-the-art protective perturbation methods across 30 transformations, with up to 30% improvement on the most challenging categories.

EOLT (Expectation Over Learned distribution of Transformation)

Novel technique introduced


DeepFake face swapping enables highly realistic identity forgeries, posing serious privacy and security risks. A common defence embeds invisible perturbations into images, but these are fragile and often destroyed by basic transformations such as compression or resizing. In this paper, we first conduct a systematic analysis of 30 transformations across six categories and show that protection robustness is highly sensitive to the choice of training transformations, making the standard Expectation over Transformation (EOT) with uniform sampling fundamentally suboptimal. Motivated by this, we propose Expectation Over Learned distribution of Transformation (EOLT), the framework to treat transformation distribution as a learnable component rather than a fixed design choice. Specifically, EOLT employs a policy network that learns to automatically prioritize critical transformations and adaptively generate instance-specific perturbations via reinforcement learning, enabling explicit modeling of defensive bottlenecks while maintaining broad transferability. Extensive experiments demonstrate that our method achieves substantial improvements over state-of-the-art approaches, with 26% higher average robustness and up to 30% gains on challenging transformation categories.


Key Contributions

  • Systematic analysis of 30 image transformations revealing that uniform sampling in standard EOT is fundamentally suboptimal, creating defensive bottlenecks for protective perturbation robustness
  • EOLT framework treating transformation distribution as a learnable component via a reinforcement learning policy network that automatically prioritizes critical transformations
  • Instance-specific perturbation generation achieving 26% higher average robustness over state-of-the-art methods with up to 30% gains on challenging transformation categories

🛡️ Threat Analysis

Output Integrity Attack

Protective perturbations embedded in source images to disrupt deepfake face-swapping models are a proactive content integrity defense. The note in ML09 explicitly cites 'anti-deepfake perturbations' as part of this category's scope — creating such protections is the defensive counterpart of the ML09 removal attacks described there. The goal is content authenticity and preventing AI-generated identity forgery, which is the core of ML09.


Details

Domains
visiongenerative
Model Types
ganrl
Threat Tags
white_boxinference_timedigital
Applications
deepfake face swappingfacial identity forgery prevention