Towards Anytime-Valid Statistical Watermarking
Baihe Huang , Eric Xu , Kannan Ramchandran , Jiantao Jiao , Michael I. Jordan
Published on arXiv
2602.17608
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Reduces average token budget required for watermark detection by 13–15% relative to state-of-the-art baselines while preserving anytime-valid Type-I error guarantees.
Anchored E-Watermarking
Novel technique introduced
The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
Key Contributions
- First e-value-based watermarking framework ('Anchored E-Watermarking') that enables anytime-valid inference via a test supermartingale, allowing valid early stopping without inflating Type-I error
- Principled characterization of the optimal e-value with respect to worst-case log-growth rate using an anchor distribution to approximate the target LLM
- Derivation of optimal expected stopping time and empirical validation showing 13–15% reduction in average token budget for detection versus state-of-the-art baselines
🛡️ Threat Analysis
Embeds watermarks in LLM text outputs to verify content provenance and distinguish machine-generated from human text — this is output integrity / content authentication, not model IP protection (ML05). The watermark lives in the generated tokens, not model weights.