ArcMark: Multi-bit LLM Watermark via Optimal Transport

Watermarking is an important tool for promoting the responsible use of language models (LMs). Existing watermarks insert a signal into generated tokens that either flags LM-generated text (zero-bit watermarking) or encodes more complex messages (multi-bit watermarking). Though a number of recent multi-bit watermarks insert several bits into text without perturbing average next-token predictions, they largely extend design principles from the zero-bit setting, such as encoding a single bit per token. Notably, the information-theoretic capacity of multi-bit watermarking -- the maximum number of bits per token that can be inserted and detected without changing average next-token predictions -- has remained unknown. We address this gap by deriving the first capacity characterization of multi-bit watermarks. Our results inform the design of ArcMark: a new watermark construction based on coding-theoretic principles that, under certain assumptions, achieves the capacity of the multi-bit watermark channel. In practice, ArcMark outperforms competing multi-bit watermarks in terms of bit rate per token and detection accuracy. Our work demonstrates that LM watermarking is fundamentally a channel coding problem, paving the way for principled coding-theoretic approaches to watermark design.

Key Contributions

First information-theoretic capacity characterization of distortion-free multi-bit LLM watermarking, framing it as a channel coding problem with side information
ArcMark: a new multi-bit watermark construction using optimal transport to map messages encoded as angles on a circle to token distributions, achieving capacity under certain assumptions
Empirical demonstration that ArcMark outperforms competing multi-bit watermarks (e.g., BiMark) in bit rate per token and detection accuracy by encoding the full message jointly rather than token-by-token

🛡️ Threat Analysis

Output Integrity Attack

Watermarks LLM text outputs to encode provenance metadata (model identity, user ID) and trace AI-generated content — this is content output watermarking, not model ownership watermarking. The watermark is embedded in generated tokens, not model weights.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2026 1 cit.

Output Integrity Attack

100%