defense 2026

Semantic Differentiation for Tackling Challenges in Watermarking Low-Entropy Constrained Generation Outputs

Nghia T. Le , Alan Ritter , Kartik Goyal

0 citations · 38 references · arXiv

α

Published on arXiv

2601.11629

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SeqMark achieves up to 28% F1 improvement in watermark detection over prior methods on constrained generation tasks while maintaining high output quality.

SeqMark

Novel technique introduced


We demonstrate that while the current approaches for language model watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks with low-entropy output spaces. Therefore, we devise SeqMark, a sequence-level watermarking algorithm with semantic differentiation that balances the output quality, watermark detectability, and imperceptibility. It improves on the shortcomings of the prevalent token-level watermarking algorithms that cause under-utilization of the sequence-level entropy available for constrained generation tasks. Moreover, we identify and improve upon a different failure mode we term region collapse, associated with prior sequence-level watermarking algorithms. This occurs because the pseudorandom partitioning of semantic space for watermarking in these approaches causes all high-probability outputs to collapse into either invalid or valid regions, leading to a trade-off in output quality and watermarking effectiveness. SeqMark instead, differentiates the high-probable output subspace and partitions it into valid and invalid regions, ensuring the even spread of high-quality outputs among all the regions. On various constrained generation tasks like machine translation, code generation, and abstractive summarization, SeqMark substantially improves watermark detection accuracy (up to 28% increase in F1) while maintaining high generation quality.


Key Contributions

  • Identifies that token-level watermarking under-utilizes sequence-level entropy and fails in low-entropy constrained generation tasks (translation, code, summarization)
  • Identifies 'region collapse' — a failure mode in prior sequence-level watermarking where pseudorandom semantic partitioning causes all high-probability outputs to cluster in a single region
  • Proposes SeqMark, a sequence-level watermarking algorithm with semantic differentiation that spreads high-quality outputs across valid/invalid regions, improving F1 by up to 28%

🛡️ Threat Analysis

Output Integrity Attack

SeqMark embeds watermarks in LLM-generated text outputs (machine translation, code, summarization) to trace content provenance — this is classic output integrity / content watermarking, not model-weight watermarking (ML05).


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
machine translationcode generationabstractive summarizationllm text provenance