DeMark: A Query-Free Black-Box Attack on Deepfake Watermarking Defenses

The rapid proliferation of realistic deepfakes has raised urgent concerns over their misuse, motivating the use of defensive watermarks in synthetic images for reliable detection and provenance tracking. However, this defense paradigm assumes such watermarks are inherently resistant to removal. We challenge this assumption with DeMark, a query-free black-box attack framework that targets defensive image watermarking schemes for deepfakes. DeMark exploits latent-space vulnerabilities in encoder-decoder watermarking models through a compressive sensing based sparsification process, suppressing watermark signals while preserving perceptual and structural realism appropriate for deepfakes. Across eight state-of-the-art watermarking schemes, DeMark reduces watermark detection accuracy from 100% to 32.9% on average while maintaining natural visual quality, outperforming existing attacks. We further evaluate three defense strategies, including image super resolution, sparse watermarking, and adversarial training, and find them largely ineffective. These results demonstrate that current encoder decoder watermarking schemes remain vulnerable to latent-space manipulations, underscoring the need for more robust watermarking methods to safeguard against deepfakes.

Key Contributions

DeMark: a query-free black-box watermark removal framework that exploits latent-space vulnerabilities in encoder-decoder watermarking models via compressive sensing-based sparsification
Evaluated against eight state-of-the-art deepfake watermarking schemes, reducing average detection accuracy from 100% to 32.9% while preserving visual quality
Systematic evaluation of three proposed defenses (super resolution, sparse watermarking, adversarial training), finding all largely ineffective against DeMark

🛡️ Threat Analysis

Output Integrity Attack

DeMark is a watermark removal attack targeting content watermarks embedded in deepfake images for provenance and detection — it defeats output integrity/authentication schemes. Per taxonomy: 'attacks that REMOVE or DEFEAT image protections (watermarks, anti-deepfake perturbations)' are ML09, not ML01, even when latent-space manipulation is involved.

Details

Domains

visiongenerative

Model Types

gandiffusioncnn

Threat Tags

black_boxinference_timetargeteddigital

Applications

2026 0 cit.

Output Integrity Attack

80%