Majority Bit-Aware Watermarking For Large Language Models

The growing deployment of Large Language Models (LLMs) in real-world applications has raised concerns about their potential misuse in generating harmful or deceptive content. To address this issue, watermarking techniques have emerged as a promising solution by embedding identifiable binary messages into generated text for origin verification and misuse tracing. While recent efforts have explored multi-bit watermarking schemes capable of embedding rich information such as user identifiers, they typically suffer from the fundamental trade-off between text quality and decoding accuracy: to ensure reliable message decoding, they have to restrict the size of preferred token sets during encoding, yet such restrictions reduce the quality of the generated content. In this work, we propose MajorMark, a novel watermarking method that improves this trade-off through majority bit-aware encoding. MajorMark selects preferred token sets based on the majority bit of the message, enabling a larger and more flexible sampling of tokens. In contrast to prior methods that rely on token frequency analysis for decoding, MajorMark employs a clustering-based decoding strategy, which maintains high decoding accuracy even when the preferred token set is large, thus preserving both content quality and decoding accuracy. We further introduce MajorMark$^+$, which partitions the message into multiple blocks to independently encode and deterministically decode each block, thereby further enhancing the quality of watermarked text and improving decoding accuracy. Extensive experiments on state-of-the-art LLMs demonstrate that our methods significantly enhance both decoding accuracy and text generation quality, outperforming prior multi-bit watermarking baselines.

Key Contributions

MajorMark: majority bit-aware encoding that selects green lists based on the dominant bit, enabling larger preferred token sets and better text quality while maintaining decoding accuracy via a clustering-based decoder.
MajorMark+: partitions the message into independently encoded and deterministically decoded blocks, further improving both text quality and decoding accuracy.
Demonstrates a significant improvement in the quality-vs-decoding-accuracy trade-off compared to prior multi-bit watermarking baselines (MPAC, RSBH, etc.).

🛡️ Threat Analysis

Output Integrity Attack

Embeds identifiable binary messages (e.g., user IDs) directly into LLM-generated text outputs for origin verification and misuse tracing — this is content watermarking for output provenance, not model weight watermarking for IP protection.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2026 1 cit.

Output Integrity Attack

100%