Beauty and the Beast: Imperceptible Perturbations Against Diffusion-Based Face Swapping via Directional Attribute Editing
Published on arXiv
2601.22744
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
FaceDefense achieves a superior imperceptibility–defense effectiveness trade-off over existing proactive defense methods (including MyFace) across all tested perturbation budgets against diffusion-based face swapping
FaceDefense
Novel technique introduced
Diffusion-based face swapping achieves state-of-the-art performance, yet it also exacerbates the potential harm of malicious face swapping to violate portraiture right or undermine personal reputation. This has spurred the development of proactive defense methods. However, existing approaches face a core trade-off: large perturbations distort facial structures, while small ones weaken protection effectiveness. To address these issues, we propose FaceDefense, an enhanced proactive defense framework against diffusion-based face swapping. Our method introduces a new diffusion loss to strengthen the defensive efficacy of adversarial examples, and employs a directional facial attribute editing to restore perturbation-induced distortions, thereby enhancing visual imperceptibility. A two-phase alternating optimization strategy is designed to generate final perturbed face images. Extensive experiments show that FaceDefense significantly outperforms existing methods in both imperceptibility and defense effectiveness, achieving a superior trade-off.
Key Contributions
- Identifies that latent-space perturbations in LDMs preferentially distort high-level facial semantics (eyes, nose) while leaving compressed regions (hair, background) intact, explaining visible facial degradation in prior methods
- Proposes FaceDefense, which employs directional facial attribute editing in the W+ space to restore perturbation-induced facial distortions, improving imperceptibility without sacrificing defense strength
- Introduces a two-phase alternating optimization strategy that jointly minimizes a novel diffusion loss and an attribute-editing restoration objective to achieve a superior imperceptibility–effectiveness trade-off
🛡️ Threat Analysis
Defends against malicious AI-generated content (deepfake face swapping) by crafting protective perturbations that corrupt diffusion model outputs — a proactive anti-deepfake content integrity defense. Per the defense tagging rule, the category reflects the threat being defended against: unauthorized AI-generated synthetic faces.