defense 2026

WorldCup Sampling for Multi-bit LLM Watermarking

Yidan Wang 1,2, Yubing Ren 1,2, Yanan Cao 1,2, Li Guo 1,2

0 citations · 61 references · arXiv (Cornell University)

α

Published on arXiv

2602.01752

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

WorldCup outperforms prior multi-bit watermarking baselines across capacity, detectability, robustness, text quality, and decoding efficiency simultaneously

WorldCup

Novel technique introduced


As large language models (LLMs) generate increasingly human-like text, watermarking offers a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing methods largely extend zero-bit schemes through seed-driven steering, leading to indirect information flow, limited effective capacity, and suboptimal decoding. In this paper, we propose WorldCup, a multi-bit watermarking framework for LLMs that treats sampling as a natural communication channel and embeds message bits directly into token selection via a hierarchical competition mechanism guided by complementary signals. Moreover, WorldCup further adopts entropy-aware modulation to preserve generation quality and supports robust message recovery through confidence-aware decoding. Comprehensive experiments show that WorldCup achieves a strong balance across capacity, detectability, robustness, text quality, and decoding efficiency, consistently outperforming prior baselines and laying a solid foundation for future LLM watermarking studies.


Key Contributions

  • WorldCup hierarchical competition mechanism that embeds multi-bit messages directly into LLM token selection, treating sampling as a communication channel
  • Entropy-aware modulation to preserve text generation quality during watermarking
  • Confidence-aware decoding for robust multi-bit message recovery from watermarked text

🛡️ Threat Analysis

Output Integrity Attack

Embeds provenance/attribution bits into LLM-generated text outputs via token sampling manipulation — this is content watermarking for output integrity and AI text attribution, not model weight watermarking.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
llm text attributioncontent provenanceai text watermarking