defense arXiv Mar 31, 2026 · 6d ago
Tor Lattimore · Google DeepMind
Near-optimal detection test for Gumbel watermarking of LLM text outputs with problem-dependent statistical efficiency guarantees
Output Integrity Attack nlp
We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.
llm Google DeepMind