defense 2025

Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake

Hongrui Zheng 1, Yuezun Li 2, Liejun Wang 1, Yunfeng Diao 3, Zhiqing Guo 1

0 citations

α

Published on arXiv

2508.07795

Output Integrity Attack

OWASP ML Top 10 — ML09

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

TSDF maintains strong dual defense capability against adversarial retraining scenarios where traditional interruption-only methods degrade sharply in effectiveness.

TSDF (Two-Stage Defense Framework)

Novel technique introduced


Active defense strategies have been developed to counter the threat of deepfake technology. However, a primary challenge is their lack of persistence, as their effectiveness is often short-lived. Attackers can bypass these defenses by simply collecting protected samples and retraining their models. This means that static defenses inevitably fail when attackers retrain their models, which severely limits practical use. We argue that an effective defense not only distorts forged content but also blocks the model's ability to adapt, which occurs when attackers retrain their models on protected images. To achieve this, we propose an innovative Two-Stage Defense Framework (TSDF). Benefiting from the intensity separation mechanism designed in this paper, the framework uses dual-function adversarial perturbations to perform two roles. First, it can directly distort the forged results. Second, it acts as a poisoning vehicle that disrupts the data preparation process essential for an attacker's retraining pipeline. By poisoning the data source, TSDF aims to prevent the attacker's model from adapting to the defensive perturbations, thus ensuring the defense remains effective long-term. Comprehensive experiments show that the performance of traditional interruption methods degrades sharply when it is subjected to adversarial retraining. However, our framework shows a strong dual defense capability, which can improve the persistence of active defense. Our code will be available at https://github.com/vpsg-research/TSDF.


Key Contributions

  • Proposes TSDF, a two-stage framework using dual-function adversarial perturbations that simultaneously distort deepfake outputs at inference time and poison the attacker's retraining data
  • Introduces an intensity separation mechanism that allows a single perturbation to serve both interruption and poisoning roles with controllable strength allocation
  • Demonstrates that traditional interruption-only defenses fail under adversarial retraining while TSDF maintains effectiveness long-term

🛡️ Threat Analysis

Data Poisoning Attack

A core contribution is the explicit use of data poisoning as a defense mechanism: the framework poisons the attacker's retraining dataset by embedding poisoning signals in protected images, preventing the adversary's model from adapting to defensive perturbations.

Output Integrity Attack

The paper's primary goal is defending against deepfakes (AI-generated content) — it adds adversarial perturbations to face images to distort forged outputs and protect content integrity, which is squarely within output integrity / content protection.


Details

Domains
visiongenerative
Model Types
gan
Threat Tags
training_timeinference_timedigital
Datasets
FaceForensics++CelebA
Applications
deepfake face generationface manipulationactive deepfake defense