defense 2025

Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers

Tzuhsuan Huang ^1,2, Cheng Yu Yeo ^1,2, Tsai-Ling Huang ², Hong-Han Shuai ², Wen-Huang Cheng ^3,1, Jun-Cheng Chen ¹

¹ Academia Sinica

² National Yang Ming Chiao Tung University

³ National Taiwan University

0 citations

Published on arXiv

2509.03006

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Combining a CNN-based spatial-domain attack network with a Transformer-based frequency-domain attack network in an ensemble training scheme improves StegaStamp robustness by 18.743% against regeneration attacks on the WAVES benchmark.

DeepRobustWatermark

Novel technique introduced

Recent studies on deep watermarking have predominantly focused on in-processing watermarking, which integrates the watermarking process into image generation. However, post-processing watermarking, which embeds watermarks after image generation, offers more flexibility. It can be applied to outputs from any generative model (e.g. GANs, diffusion models) without needing access to the model's internal structure. It also allows users to embed unique watermarks into individual images. Therefore, this study focuses on post-processing watermarking and enhances its robustness by incorporating an ensemble attack network during training. We construct various versions of attack networks using CNN and Transformer in both spatial and frequency domains to investigate how each combination influences the robustness of the watermarking model. Our results demonstrate that combining a CNN-based attack network in the spatial domain with a Transformer-based attack network in the frequency domain yields the highest robustness in watermarking models. Extensive evaluation on the WAVES benchmark, using average bit accuracy as the metric, demonstrates that our ensemble attack network significantly enhances the robustness of baseline watermarking methods under various stress tests. In particular, for the Regeneration Attack defined in WAVES, our method improves StegaStamp by 18.743%. The code is released at:https://github.com/aiiu-lab/DeepRobustWatermark.

Key Contributions

Ensemble attack network combining CNN (spatial domain) and Transformer (frequency domain) attack networks for adversarial training of watermarking models
Systematic ablation of CNN vs. Transformer and spatial vs. frequency domain combinations to identify the most effective configuration for watermark robustness
18.743% improvement over StegaStamp on the Regeneration Attack in the WAVES benchmark using the proposed ensemble training strategy

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses content watermarking robustness for AI-generated images — watermarks are embedded in image outputs (not model weights) to enable provenance tracking; the ensemble attack network is an adversarial training technique to harden these content watermarks against removal attacks including regeneration attacks.

Details

Domains

visiongenerative

Model Types

cnntransformergandiffusion

Threat Tags

inference_timedigital

Datasets

WAVES

Applications

image watermarkingai-generated content provenancecopyright protection

Read PDF arXiv Code

Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images

Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution

Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection

Diversity over Uniformity: Rethinking Representation in Generated Image Detection

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing

LiteUpdate: A Lightweight Framework for Updating AI-Generated Image Detectors

TrueMoE: Dual-Routing Mixture of Discriminative Experts for Synthetic Image Detection