Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID
Xia Han , Qi Li , Jianbing Ni , Mohammad Zulkernine
Published on arXiv
2508.20228
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SynGuard improves watermark recovery by an average of 11.1% in F1 score over SynthID-Text under paraphrasing, copy-paste, and back-translation attack scenarios.
SynGuard
Novel technique introduced
Recent advances in LLM watermarking methods such as SynthID-Text by Google DeepMind offer promising solutions for tracing the provenance of AI-generated text. However, our robustness assessment reveals that SynthID-Text is vulnerable to meaning-preserving attacks, such as paraphrasing, copy-paste modifications, and back-translation, which can significantly degrade watermark detectability. To address these limitations, we propose SynGuard, a hybrid framework that combines the semantic alignment strength of Semantic Information Retrieval (SIR) with the probabilistic watermarking mechanism of SynthID-Text. Our approach jointly embeds watermarks at both lexical and semantic levels, enabling robust provenance tracking while preserving the original meaning. Experimental results across multiple attack scenarios show that SynGuard improves watermark recovery by an average of 11.1\% in F1 score compared to SynthID-Text. These findings demonstrate the effectiveness of semantic-aware watermarking in resisting real-world tampering. All code, datasets, and evaluation scripts are publicly available at: https://github.com/githshine/SynGuard.
Key Contributions
- Robustness assessment demonstrating that SynthID-Text is vulnerable to meaning-preserving attacks (paraphrasing, copy-paste, back-translation) that significantly degrade watermark detectability
- SynGuard, a hybrid watermarking framework jointly embedding watermarks at both lexical (SynthID-Text probabilistic mechanism) and semantic (SIR) levels for improved provenance tracking
- Empirical evaluation showing SynGuard improves watermark recovery by an average 11.1% F1 score over SynthID-Text across multiple real-world attack scenarios
🛡️ Threat Analysis
The paper assesses and strengthens watermarks embedded in LLM text OUTPUTS to trace AI-generated content provenance — classic output integrity. The attack surface (paraphrasing, copy-paste, back-translation defeating watermark detection) and the defense (SynGuard hybrid lexical+semantic watermarking) both squarely target content-level output integrity, not model weights or model IP.