defense 2026

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

Yuchen Chen , Yuan Xiao , Chunrong Fang , Zhenyu Chen , Baowen Xu

0 citations

α

Published on arXiv

2604.10611

Output Integrity Attack

OWASP ML Top 10 — ML09

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Achieves strong verifiability (p < 0.05), high stealthiness (suspicious rate ≤ 0.36), robustness against watermark removal (recall ≤ 0.57), and 28.6% Pass@1 drop upon watermark removal across 72 experimental settings

DuCodeMark

Novel technique introduced


The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source code datasets. While watermarking provides a viable protection mechanism by embedding ownership signals, existing methods rely on detectable trigger-target patterns and are limited to source-code tasks, overlooking other scenarios such as decompilation tasks. In this paper, we propose DuCodeMark, a stealthy and robust dual-purpose watermarking method for code datasets that generalizes across both source-code tasks and decompilation tasks. DuCodeMark parses each code sample into an abstract syntax tree (AST), applies language-specific style transformations to construct stealthy trigger-target pairs, and injects repressible poisoned features into a subset of return-typed samples to enhance robustness against watermark removal or evasion. These features remain inactive during normal training but are activated upon watermark removal, degrading model performance. For verification, DuCodeMark employs a black-box method based on the independent-samples $t$-test. We conduct a comprehensive evaluation of DuCodeMark across 72 settings spanning two code tasks, two programming languages, three CodeLMs, and six decoding temperatures. The results demonstrate that it consistently achieves strong verifiability ($p < 0.05$), high stealthiness (suspicious rate $\leq$ 0.36), robustness against both watermark and poisoning attacks (recall $\leq$ 0.57), and a substantial drop in model performance upon watermark removal (Pass@1 drops by 28.6%), underscoring its practicality and resilience.


Key Contributions

  • Dual-purpose watermarking method for code datasets using AST-based style transformations to create stealthy trigger-target pairs
  • Repressible poisoned features that remain inactive during normal training but degrade model performance upon watermark removal attempts
  • Black-box verification method using independent-samples t-test, generalizing across source-code and decompilation tasks

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is watermarking TRAINING DATA (code datasets) to prove ownership and trace provenance. The watermark is embedded in the dataset via trigger-target pairs, not in model weights. This is data provenance/integrity protection, fitting ML09's scope of output/content integrity and authenticity verification.

Model Poisoning

The method embeds backdoor triggers (style-transformed code samples) in the training data that cause models to produce specific outputs when triggered. Additionally, it injects 'repressible poisoned features' that activate upon watermark removal to degrade model performance. These are classic backdoor/trojan mechanisms—hidden behaviors activated by specific triggers.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
training_timeblack_box
Datasets
code datasets (Python, Java)
Applications
code completioncode decompilationdataset ownership verification