Semantic Differentiation for Tackling Challenges in Watermarking Low-Entropy Constrained Generation Outputs
Nghia T. Le , Alan Ritter , Kartik Goyal
Published on arXiv
2601.11629
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SeqMark achieves up to 28% F1 improvement in watermark detection over prior methods on constrained generation tasks while maintaining high output quality.
SeqMark
Novel technique introduced
We demonstrate that while the current approaches for language model watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks with low-entropy output spaces. Therefore, we devise SeqMark, a sequence-level watermarking algorithm with semantic differentiation that balances the output quality, watermark detectability, and imperceptibility. It improves on the shortcomings of the prevalent token-level watermarking algorithms that cause under-utilization of the sequence-level entropy available for constrained generation tasks. Moreover, we identify and improve upon a different failure mode we term region collapse, associated with prior sequence-level watermarking algorithms. This occurs because the pseudorandom partitioning of semantic space for watermarking in these approaches causes all high-probability outputs to collapse into either invalid or valid regions, leading to a trade-off in output quality and watermarking effectiveness. SeqMark instead, differentiates the high-probable output subspace and partitions it into valid and invalid regions, ensuring the even spread of high-quality outputs among all the regions. On various constrained generation tasks like machine translation, code generation, and abstractive summarization, SeqMark substantially improves watermark detection accuracy (up to 28% increase in F1) while maintaining high generation quality.
Key Contributions
- Identifies that token-level watermarking under-utilizes sequence-level entropy and fails in low-entropy constrained generation tasks (translation, code, summarization)
- Identifies 'region collapse' — a failure mode in prior sequence-level watermarking where pseudorandom semantic partitioning causes all high-probability outputs to cluster in a single region
- Proposes SeqMark, a sequence-level watermarking algorithm with semantic differentiation that spreads high-quality outputs across valid/invalid regions, improving F1 by up to 28%
🛡️ Threat Analysis
SeqMark embeds watermarks in LLM-generated text outputs (machine translation, code, summarization) to trace content provenance — this is classic output integrity / content watermarking, not model-weight watermarking (ML05).