defense 2026

ArcMark: Multi-bit LLM Watermark via Optimal Transport

Atefeh Gilani 1, Carol Xuan Long 2, Sajani Vithana 2, Oliver Kosut 1, Lalitha Sankar 1, Flavio P. Calmon 2

0 citations · 24 references · arXiv (Cornell University)

α

Published on arXiv

2602.07235

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

ArcMark achieves the information-theoretic capacity of the multi-bit watermark channel and outperforms competing methods in bit rate per token and detection accuracy while remaining distortion-free.

ArcMark

Novel technique introduced


Watermarking is an important tool for promoting the responsible use of language models (LMs). Existing watermarks insert a signal into generated tokens that either flags LM-generated text (zero-bit watermarking) or encodes more complex messages (multi-bit watermarking). Though a number of recent multi-bit watermarks insert several bits into text without perturbing average next-token predictions, they largely extend design principles from the zero-bit setting, such as encoding a single bit per token. Notably, the information-theoretic capacity of multi-bit watermarking -- the maximum number of bits per token that can be inserted and detected without changing average next-token predictions -- has remained unknown. We address this gap by deriving the first capacity characterization of multi-bit watermarks. Our results inform the design of ArcMark: a new watermark construction based on coding-theoretic principles that, under certain assumptions, achieves the capacity of the multi-bit watermark channel. In practice, ArcMark outperforms competing multi-bit watermarks in terms of bit rate per token and detection accuracy. Our work demonstrates that LM watermarking is fundamentally a channel coding problem, paving the way for principled coding-theoretic approaches to watermark design.


Key Contributions

  • First information-theoretic capacity characterization of distortion-free multi-bit LLM watermarking, framing it as a channel coding problem with side information
  • ArcMark: a new multi-bit watermark construction using optimal transport to map messages encoded as angles on a circle to token distributions, achieving capacity under certain assumptions
  • Empirical demonstration that ArcMark outperforms competing multi-bit watermarks (e.g., BiMark) in bit rate per token and detection accuracy by encoding the full message jointly rather than token-by-token

🛡️ Threat Analysis

Output Integrity Attack

Watermarks LLM text outputs to encode provenance metadata (model identity, user ID) and trace AI-generated content — this is content output watermarking, not model ownership watermarking. The watermark is embedded in generated tokens, not model weights.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
llm text provenance trackingai-generated content attributionregulatory compliance marking