α

Published on arXiv

2512.08918

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Constructs the first PRCs achieving subexponential security, worst-case binary-alphabet edit robustness, and detection-key robustness simultaneously — properties individually unachieved by all prior PRC constructions

Permuted Pseudorandom Codes (PRCs)

Novel technique introduced


Watermarks are an essential tool for identifying AI-generated content. Recently, Christ and Gunn (CRYPTO '24) introduced pseudorandom error-correcting codes (PRCs), which are equivalent to watermarks with strong robustness and quality guarantees. A PRC is a pseudorandom encryption scheme whose decryption algorithm tolerates a high rate of errors. Pseudorandomness ensures quality preservation of the watermark, and error tolerance of decryption translates to the watermark's ability to withstand modification of the content. In the short time since the introduction of PRCs, several works (NeurIPS '24, RANDOM '25, STOC '25) have proposed new constructions. Curiously, all of these constructions are vulnerable to quasipolynomial-time distinguishing attacks. Furthermore, all lack robustness to edits over a constant-sized alphabet, which is necessary for a meaningfully robust LLM watermark. Lastly, they lack robustness to adversaries who know the watermarking detection key. Until now, it was not clear whether any of these properties was achievable individually, let alone together. We construct pseudorandom codes that achieve all of the above: plausible subexponential pseudorandomness security, robustness to worst-case edits over a binary alphabet, and robustness against even computationally unbounded adversaries that have the detection key. Pseudorandomness rests on a new assumption that we formalize, the permuted codes conjecture, which states that a distribution of permuted noisy codewords is pseudorandom. We show that this conjecture is implied by the permuted puzzles conjecture used previously to construct doubly efficient private information retrieval. To give further evidence, we show that the conjecture holds against a broad class of simple distinguishers, including read-once branching programs.


Key Contributions

  • Introduces the permuted codes conjecture (implied by the permuted puzzles conjecture) as a new cryptographic assumption enabling stronger PRC security
  • First PRC construction achieving plausible subexponential pseudorandomness security, closing the gap left by prior quasipolynomial-vulnerable constructions
  • First watermarking scheme simultaneously robust to worst-case edits over a constant-sized alphabet and to computationally unbounded adversaries possessing the detection key

🛡️ Threat Analysis

Output Integrity Attack

Paper constructs pseudorandom error-correcting codes (PRCs) explicitly framed as LLM output watermarks for identifying AI-generated content, addressing robustness against adversaries who modify content or know the detection key — a direct output integrity and content provenance contribution.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Applications
llm watermarkingai-generated text detection