Semantic Differentiation for Tackling Challenges in Watermarking Low-Entropy Constrained Generation Outputs

We demonstrate that while the current approaches for language model watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks with low-entropy output spaces. Therefore, we devise SeqMark, a sequence-level watermarking algorithm with semantic differentiation that balances the output quality, watermark detectability, and imperceptibility. It improves on the shortcomings of the prevalent token-level watermarking algorithms that cause under-utilization of the sequence-level entropy available for constrained generation tasks. Moreover, we identify and improve upon a different failure mode we term region collapse, associated with prior sequence-level watermarking algorithms. This occurs because the pseudorandom partitioning of semantic space for watermarking in these approaches causes all high-probability outputs to collapse into either invalid or valid regions, leading to a trade-off in output quality and watermarking effectiveness. SeqMark instead, differentiates the high-probable output subspace and partitions it into valid and invalid regions, ensuring the even spread of high-quality outputs among all the regions. On various constrained generation tasks like machine translation, code generation, and abstractive summarization, SeqMark substantially improves watermark detection accuracy (up to 28% increase in F1) while maintaining high generation quality.

Key Contributions

Identifies that token-level watermarking under-utilizes sequence-level entropy and fails in low-entropy constrained generation tasks (translation, code, summarization)
Identifies 'region collapse' — a failure mode in prior sequence-level watermarking where pseudorandom semantic partitioning causes all high-probability outputs to cluster in a single region
Proposes SeqMark, a sequence-level watermarking algorithm with semantic differentiation that spreads high-quality outputs across valid/invalid regions, improving F1 by up to 28%

🛡️ Threat Analysis

Output Integrity Attack

SeqMark embeds watermarks in LLM-generated text outputs (machine translation, code, summarization) to trace content provenance — this is classic output integrity / content watermarking, not model-weight watermarking (ML05).

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2025 0 cit.

Output Integrity Attack

100%