Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers
Tzuhsuan Huang 1,2, Cheng Yu Yeo 1,2, Tsai-Ling Huang 2, Hong-Han Shuai 2, Wen-Huang Cheng 3,1, Jun-Cheng Chen 1
Published on arXiv
2509.03006
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Combining a CNN-based spatial-domain attack network with a Transformer-based frequency-domain attack network in an ensemble training scheme improves StegaStamp robustness by 18.743% against regeneration attacks on the WAVES benchmark.
DeepRobustWatermark
Novel technique introduced
Recent studies on deep watermarking have predominantly focused on in-processing watermarking, which integrates the watermarking process into image generation. However, post-processing watermarking, which embeds watermarks after image generation, offers more flexibility. It can be applied to outputs from any generative model (e.g. GANs, diffusion models) without needing access to the model's internal structure. It also allows users to embed unique watermarks into individual images. Therefore, this study focuses on post-processing watermarking and enhances its robustness by incorporating an ensemble attack network during training. We construct various versions of attack networks using CNN and Transformer in both spatial and frequency domains to investigate how each combination influences the robustness of the watermarking model. Our results demonstrate that combining a CNN-based attack network in the spatial domain with a Transformer-based attack network in the frequency domain yields the highest robustness in watermarking models. Extensive evaluation on the WAVES benchmark, using average bit accuracy as the metric, demonstrates that our ensemble attack network significantly enhances the robustness of baseline watermarking methods under various stress tests. In particular, for the Regeneration Attack defined in WAVES, our method improves StegaStamp by 18.743%. The code is released at:https://github.com/aiiu-lab/DeepRobustWatermark.
Key Contributions
- Ensemble attack network combining CNN (spatial domain) and Transformer (frequency domain) attack networks for adversarial training of watermarking models
- Systematic ablation of CNN vs. Transformer and spatial vs. frequency domain combinations to identify the most effective configuration for watermark robustness
- 18.743% improvement over StegaStamp on the Regeneration Attack in the WAVES benchmark using the proposed ensemble training strategy
🛡️ Threat Analysis
Directly addresses content watermarking robustness for AI-generated images — watermarks are embedded in image outputs (not model weights) to enable provenance tracking; the ensemble attack network is an adversarial training technique to harden these content watermarks against removal attacks including regeneration attacks.