DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

Large web-scale datasets have driven the rapid advancement of pre-trained language models (PLMs), but unauthorized data usage has raised serious copyright concerns. Existing dataset ownership verification (DOV) methods typically assume that watermarks remain stable during inference; however, this assumption often fails under natural noise and adversary-crafted perturbations. We propose the first certified dataset ownership verification method for PLMs under a gray-box setting (i.e., the defender can only query the suspicious model but is aware of its input representation module), based on dual-space smoothing (i.e., DSSmoothing). To address the challenges of text discreteness and semantic sensitivity, DSSmoothing introduces continuous perturbations in the embedding space to capture semantic robustness and applies controlled token reordering in the permutation space to capture sequential robustness. DSSmoothing consists of two stages: in the first stage, triggers are collaboratively embedded in both spaces to generate norm-constrained and robust watermarked datasets; in the second stage, randomized smoothing is applied in both spaces during verification to compute the watermark robustness (WR) of suspicious models and statistically compare it with the principal probability (PP) values of a set of benign models. Theoretically, DSSmoothing provides provable robustness guarantees for dataset ownership verification by ensuring that WR consistently exceeds PP under bounded dual-space perturbations. Extensive experiments on multiple representative web datasets demonstrate that DSSmoothing achieves stable and reliable verification performance and exhibits robustness against potential adaptive attacks. Our code is available at https://github.com/NcepuQiaoTing/DSSmoothing.

Key Contributions

First certified dataset ownership verification (DOV) method for PLMs in a gray-box setting, using randomized smoothing across two complementary spaces
Dual-space trigger embedding: continuous perturbations in the embedding space (semantic robustness) and token reordering in the permutation space (sequential robustness)
Provable robustness guarantees ensuring watermark robustness (WR) statistically exceeds principal probability (PP) of benign models under bounded dual-space perturbations

🛡️ Threat Analysis

Output Integrity Attack

Watermarks are embedded in TRAINING DATA (not model weights) to detect unauthorized use — this is data provenance and content integrity, not model IP theft. Per classification rules, training data watermarking to detect misappropriation maps to ML09.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

grey_boxtraining_timeinference_time

Datasets

multiple representative web datasets

Applications

2025 1 cit.

Output Integrity Attack

82%

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Trusting What You Cannot See: Auditable Fine-Tuning and Inference for Proprietary AI

PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Optimal Detection for Language Watermarks with Pseudorandom Collision

MGT-Prism: Enhancing Domain Generalization for Machine-Generated Text Detection via Spectral Alignment

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking

CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models