Towards Robust Protective Perturbation against DeepFake Face Swapping

DeepFake face swapping enables highly realistic identity forgeries, posing serious privacy and security risks. A common defence embeds invisible perturbations into images, but these are fragile and often destroyed by basic transformations such as compression or resizing. In this paper, we first conduct a systematic analysis of 30 transformations across six categories and show that protection robustness is highly sensitive to the choice of training transformations, making the standard Expectation over Transformation (EOT) with uniform sampling fundamentally suboptimal. Motivated by this, we propose Expectation Over Learned distribution of Transformation (EOLT), the framework to treat transformation distribution as a learnable component rather than a fixed design choice. Specifically, EOLT employs a policy network that learns to automatically prioritize critical transformations and adaptively generate instance-specific perturbations via reinforcement learning, enabling explicit modeling of defensive bottlenecks while maintaining broad transferability. Extensive experiments demonstrate that our method achieves substantial improvements over state-of-the-art approaches, with 26% higher average robustness and up to 30% gains on challenging transformation categories.

Key Contributions

Systematic analysis of 30 image transformations revealing that uniform sampling in standard EOT is fundamentally suboptimal, creating defensive bottlenecks for protective perturbation robustness
EOLT framework treating transformation distribution as a learnable component via a reinforcement learning policy network that automatically prioritizes critical transformations
Instance-specific perturbation generation achieving 26% higher average robustness over state-of-the-art methods with up to 30% gains on challenging transformation categories

🛡️ Threat Analysis

Output Integrity Attack

Protective perturbations embedded in source images to disrupt deepfake face-swapping models are a proactive content integrity defense. The note in ML09 explicitly cites 'anti-deepfake perturbations' as part of this category's scope — creating such protections is the defensive counterpart of the ML09 removal attacks described there. The goal is content authenticity and preventing AI-generated identity forgery, which is the core of ML09.

Details

Domains

visiongenerative

Model Types

ganrl

Threat Tags

white_boxinference_timedigital

Applications

2025 0 cit.

Output Integrity Attack

79%