defense 2025

Majority Bit-Aware Watermarking For Large Language Models

Jiahao Xu , Rui Hu , Zikai Zhang

0 citations

α

Published on arXiv

2508.03829

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

MajorMark and MajorMark+ outperform prior multi-bit watermarking baselines on both decoding accuracy and text generation quality across state-of-the-art LLMs.

MajorMark

Novel technique introduced


The growing deployment of Large Language Models (LLMs) in real-world applications has raised concerns about their potential misuse in generating harmful or deceptive content. To address this issue, watermarking techniques have emerged as a promising solution by embedding identifiable binary messages into generated text for origin verification and misuse tracing. While recent efforts have explored multi-bit watermarking schemes capable of embedding rich information such as user identifiers, they typically suffer from the fundamental trade-off between text quality and decoding accuracy: to ensure reliable message decoding, they have to restrict the size of preferred token sets during encoding, yet such restrictions reduce the quality of the generated content. In this work, we propose MajorMark, a novel watermarking method that improves this trade-off through majority bit-aware encoding. MajorMark selects preferred token sets based on the majority bit of the message, enabling a larger and more flexible sampling of tokens. In contrast to prior methods that rely on token frequency analysis for decoding, MajorMark employs a clustering-based decoding strategy, which maintains high decoding accuracy even when the preferred token set is large, thus preserving both content quality and decoding accuracy. We further introduce MajorMark$^+$, which partitions the message into multiple blocks to independently encode and deterministically decode each block, thereby further enhancing the quality of watermarked text and improving decoding accuracy. Extensive experiments on state-of-the-art LLMs demonstrate that our methods significantly enhance both decoding accuracy and text generation quality, outperforming prior multi-bit watermarking baselines.


Key Contributions

  • MajorMark: majority bit-aware encoding that selects green lists based on the dominant bit, enabling larger preferred token sets and better text quality while maintaining decoding accuracy via a clustering-based decoder.
  • MajorMark+: partitions the message into independently encoded and deterministically decoded blocks, further improving both text quality and decoding accuracy.
  • Demonstrates a significant improvement in the quality-vs-decoding-accuracy trade-off compared to prior multi-bit watermarking baselines (MPAC, RSBH, etc.).

🛡️ Threat Analysis

Output Integrity Attack

Embeds identifiable binary messages (e.g., user IDs) directly into LLM-generated text outputs for origin verification and misuse tracing — this is content watermarking for output provenance, not model weight watermarking for IP protection.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
llm text generationcontent attributionmisuse tracinguser identification