Decoupling Defense Strategies for Robust Image Watermarking

Deep learning-based image watermarking, while robust against conventional distortions, remains vulnerable to advanced adversarial and regeneration attacks. Conventional countermeasures, which jointly optimize the encoder and decoder via a noise layer, face 2 inevitable challenges: (1) decrease of clean accuracy due to decoder adversarial training and (2) limited robustness due to simultaneous training of all three advanced attacks. To overcome these issues, we propose AdvMark, a novel two-stage fine-tuning framework that decouples the defense strategies. In stage 1, we address adversarial vulnerability via a tailored adversarial training paradigm that primarily fine-tunes the encoder while only conditionally updating the decoder. This approach learns to move the image into a non-attackable region, rather than modifying the decision boundary, thus preserving clean accuracy. In stage 2, we tackle distortion and regeneration attacks via direct image optimization. To preserve the adversarial robustness gained in stage 1, we formulate a principled, constrained image loss with theoretical guarantees, which balances the deviation from cover and previous encoded images. We also propose a quality-aware early-stop to further guarantee the lower bound of visual quality. Extensive experiments demonstrate AdvMark outperforms with the highest image quality and comprehensive robustness, i.e. up to 29\%, 33\% and 46\% accuracy improvement for distortion, regeneration and adversarial attacks, respectively.

Key Contributions

Two-stage decoupled fine-tuning framework (AdvMark) that separately addresses adversarial vulnerability (stage 1) and distortion/regeneration attacks (stage 2), avoiding the clean-accuracy degradation inherent to joint training approaches.
Adversarial training paradigm that primarily fine-tunes the encoder to move images into a non-attackable region, preserving clean decoder accuracy while gaining adversarial robustness.
Constrained image optimization with theoretical guarantees and a quality-aware early-stop mechanism for stage 2, balancing robustness against regeneration/distortion attacks with visual quality.

🛡️ Threat Analysis

Output Integrity Attack

The paper defends image content watermarks — marks embedded in image outputs to trace provenance — against attacks that destroy or circumvent them (adversarial, regeneration, distortion). This is output integrity protection for watermarked content, not model-weight watermarking for IP protection.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxdigitalinference_time

Applications

2026 0 cit.

Output Integrity Attack

82%

Decoupling Defense Strategies for Robust Image Watermarking

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Novel Local Focusing Mechanism for Deepfake Detection Generalization

Attack-Aware Deepfake Detection under Counter-Forensic Manipulations

Phase4DFD: Multi-Domain Phase-Aware Attention for Deepfake Detection

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization

Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior

Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face

Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

A Novel Unified Approach to Deepfake Detection