defense 2025

Optimizing Token Choice for Code Watermarking: An RL Approach

Zhimeng Guo , Huaisheng Zhu , Siyuan Xu , Hangfan Zhang , Teng Xiao , Minhao Cheng

0 citations

α

Published on arXiv

2508.11925

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

CodeTracer significantly outperforms state-of-the-art baselines in both watermark detectability and preservation of generated code functionality across comparative evaluations

CodeTracer

Novel technique introduced


Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality.


Key Contributions

  • CodeTracer: an RL-based adaptive watermarking framework that uses a parameterized policy to intelligently bias token choices during LLM code generation
  • Comprehensive reward system combining execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards
  • Gumbel Top-k reparameterization to enable gradient-based optimization of otherwise discrete watermarking decisions

🛡️ Threat Analysis

Output Integrity Attack

CodeTracer watermarks LLM-generated code at the output token level to trace provenance and protect intellectual property — the watermark is embedded in generated content (outputs), not in model weights, making this output integrity/content watermarking rather than model ownership protection (ML05).


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_time
Applications
code generationllm-generated code ip protection