Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID

Recent advances in LLM watermarking methods such as SynthID-Text by Google DeepMind offer promising solutions for tracing the provenance of AI-generated text. However, our robustness assessment reveals that SynthID-Text is vulnerable to meaning-preserving attacks, such as paraphrasing, copy-paste modifications, and back-translation, which can significantly degrade watermark detectability. To address these limitations, we propose SynGuard, a hybrid framework that combines the semantic alignment strength of Semantic Information Retrieval (SIR) with the probabilistic watermarking mechanism of SynthID-Text. Our approach jointly embeds watermarks at both lexical and semantic levels, enabling robust provenance tracking while preserving the original meaning. Experimental results across multiple attack scenarios show that SynGuard improves watermark recovery by an average of 11.1\% in F1 score compared to SynthID-Text. These findings demonstrate the effectiveness of semantic-aware watermarking in resisting real-world tampering. All code, datasets, and evaluation scripts are publicly available at: https://github.com/githshine/SynGuard.

Key Contributions

Robustness assessment demonstrating that SynthID-Text is vulnerable to meaning-preserving attacks (paraphrasing, copy-paste, back-translation) that significantly degrade watermark detectability
SynGuard, a hybrid watermarking framework jointly embedding watermarks at both lexical (SynthID-Text probabilistic mechanism) and semantic (SIR) levels for improved provenance tracking
Empirical evaluation showing SynGuard improves watermark recovery by an average 11.1% F1 score over SynthID-Text across multiple real-world attack scenarios

🛡️ Threat Analysis

Output Integrity Attack

The paper assesses and strengthens watermarks embedded in LLM text OUTPUTS to trace AI-generated content provenance — classic output integrity. The attack surface (paraphrasing, copy-paste, back-translation defeating watermark detection) and the defense (SynGuard hybrid lexical+semantic watermarking) both squarely target content-level output integrity, not model weights or model IP.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

2026 0 cit.

Output Integrity Attack

100%

Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Every Language Model Has a Forgery-Resistant Signature

SimKey: A Semantically Aware Key Module for Watermarking Language Models

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Variation is the Key: A Variation-Based Framework for LLM-Generated Text Detection