defense 2025

Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID

Xia Han , Qi Li , Jianbing Ni , Mohammad Zulkernine

0 citations

α

Published on arXiv

2508.20228

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SynGuard improves watermark recovery by an average of 11.1% in F1 score over SynthID-Text under paraphrasing, copy-paste, and back-translation attack scenarios.

SynGuard

Novel technique introduced


Recent advances in LLM watermarking methods such as SynthID-Text by Google DeepMind offer promising solutions for tracing the provenance of AI-generated text. However, our robustness assessment reveals that SynthID-Text is vulnerable to meaning-preserving attacks, such as paraphrasing, copy-paste modifications, and back-translation, which can significantly degrade watermark detectability. To address these limitations, we propose SynGuard, a hybrid framework that combines the semantic alignment strength of Semantic Information Retrieval (SIR) with the probabilistic watermarking mechanism of SynthID-Text. Our approach jointly embeds watermarks at both lexical and semantic levels, enabling robust provenance tracking while preserving the original meaning. Experimental results across multiple attack scenarios show that SynGuard improves watermark recovery by an average of 11.1\% in F1 score compared to SynthID-Text. These findings demonstrate the effectiveness of semantic-aware watermarking in resisting real-world tampering. All code, datasets, and evaluation scripts are publicly available at: https://github.com/githshine/SynGuard.


Key Contributions

  • Robustness assessment demonstrating that SynthID-Text is vulnerable to meaning-preserving attacks (paraphrasing, copy-paste, back-translation) that significantly degrade watermark detectability
  • SynGuard, a hybrid watermarking framework jointly embedding watermarks at both lexical (SynthID-Text probabilistic mechanism) and semantic (SIR) levels for improved provenance tracking
  • Empirical evaluation showing SynGuard improves watermark recovery by an average 11.1% F1 score over SynthID-Text across multiple real-world attack scenarios

🛡️ Threat Analysis

Output Integrity Attack

The paper assesses and strengthens watermarks embedded in LLM text OUTPUTS to trace AI-generated content provenance — classic output integrity. The attack surface (paraphrasing, copy-paste, back-translation defeating watermark detection) and the defense (SynGuard hybrid lexical+semantic watermarking) both squarely target content-level output integrity, not model weights or model IP.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
llm-generated text provenance trackingai text detection