DeMark: A Query-Free Black-Box Attack on Deepfake Watermarking Defenses
Wei Song 1, Zhenchang Xing 2, Liming Zhu 1, Yulei Sui 2, Jingling Xue 1
Published on arXiv
2601.16473
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
DeMark reduces watermark detection accuracy from 100% to 32.9% on average across eight state-of-the-art schemes without querying the watermarking model
DeMark
Novel technique introduced
The rapid proliferation of realistic deepfakes has raised urgent concerns over their misuse, motivating the use of defensive watermarks in synthetic images for reliable detection and provenance tracking. However, this defense paradigm assumes such watermarks are inherently resistant to removal. We challenge this assumption with DeMark, a query-free black-box attack framework that targets defensive image watermarking schemes for deepfakes. DeMark exploits latent-space vulnerabilities in encoder-decoder watermarking models through a compressive sensing based sparsification process, suppressing watermark signals while preserving perceptual and structural realism appropriate for deepfakes. Across eight state-of-the-art watermarking schemes, DeMark reduces watermark detection accuracy from 100% to 32.9% on average while maintaining natural visual quality, outperforming existing attacks. We further evaluate three defense strategies, including image super resolution, sparse watermarking, and adversarial training, and find them largely ineffective. These results demonstrate that current encoder decoder watermarking schemes remain vulnerable to latent-space manipulations, underscoring the need for more robust watermarking methods to safeguard against deepfakes.
Key Contributions
- DeMark: a query-free black-box watermark removal framework that exploits latent-space vulnerabilities in encoder-decoder watermarking models via compressive sensing-based sparsification
- Evaluated against eight state-of-the-art deepfake watermarking schemes, reducing average detection accuracy from 100% to 32.9% while preserving visual quality
- Systematic evaluation of three proposed defenses (super resolution, sparse watermarking, adversarial training), finding all largely ineffective against DeMark
🛡️ Threat Analysis
DeMark is a watermark removal attack targeting content watermarks embedded in deepfake images for provenance and detection — it defeats output integrity/authentication schemes. Per taxonomy: 'attacks that REMOVE or DEFEAT image protections (watermarks, anti-deepfake perturbations)' are ML09, not ML01, even when latent-space manipulation is involved.