defense 2025

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

0 citations · 74 references · arXiv

Published on arXiv

2511.04711

Model Theft

OWASP ML Top 10 — ML05

Key Finding

SWAP successfully audits soft prompt copyright across 11 datasets with negligible accuracy impact and robustness against adaptive attacks, where prior backdoor-based and non-intrusive methods both fail.

SWAP

Novel technique introduced

Large-scale vision-language models, especially CLIP, have demonstrated remarkable performance across diverse downstream tasks. Soft prompts, as carefully crafted modules that efficiently adapt vision-language models to specific tasks, necessitate effective copyright protection. In this paper, we investigate model copyright protection by auditing whether suspicious third-party models incorporate protected soft prompts. While this can be viewed as a special case of model ownership auditing, our analysis shows that existing techniques are ineffective due to prompt learning's unique characteristics. Non-intrusive auditing is inherently prone to false positives when independent models share similar data distributions with victim models. Intrusive approaches also fail: backdoor methods designed for CLIP cannot embed functional triggers, while extending traditional DNN backdoor techniques to prompt learning suffers from harmfulness and ambiguity challenges. We find that these failures in intrusive auditing stem from the same fundamental reason: watermarking operates within the same decision space as the primary task yet pursues opposing objectives. Motivated by these findings, we propose sequential watermarking for soft prompts (SWAP), which implants watermarks into a different and more complex space. SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes, inspired by the zero-shot prediction capability of CLIP. This watermark, which is embedded in a more complex space, keeps the original prediction label unchanged, making it less opposed to the primary task. We further design a hypothesis-test-guided verification protocol for SWAP and provide theoretical analyses of success conditions. Extensive experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.

Key Contributions

Analyzes why existing non-intrusive and backdoor-based intrusive auditing techniques fail for soft prompts due to shared decision-space conflicts with the primary task
Proposes SWAP, which encodes ownership watermarks through a defender-specified sequential ordering of out-of-distribution classes using CLIP's zero-shot capability, operating in a higher-complexity space that avoids opposing the primary task
Designs a hypothesis-test-guided verification protocol for SWAP with theoretical success conditions and demonstrates effectiveness across 11 datasets against adaptive attacks

🛡️ Threat Analysis

Model Theft

SWAP embeds watermarks INTO the soft prompt (a learned model component) to prove ownership if it is stolen and incorporated into a suspicious third-party model — this is model/IP watermarking for ownership verification, not content provenance tracking. The verification protocol determines whether a suspicious model contains the protected prompt, which is the canonical ML05 model ownership auditing scenario.

Details

Domains

visionmultimodal

Model Types

vlmtransformer

Threat Tags

black_boxtraining_timeinference_time

Datasets

11 downstream classification datasets (specific names not listed in provided text)

Applications

vision-language model adaptationsoft prompt ip protectionmodel ownership auditing

Read PDF arXiv DOI

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ActiveMark: on watermarking of visual foundation models via massive activations

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

RandMark: On Random Watermarking of Visual Foundation Models

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Defense against Unauthorized Distillation in Image Restoration via Feature Space Perturbation