defense 2025

Watermarks for Embeddings-as-a-Service Large Language Models

Anudeex Shetty

0 citations · arXiv

α

Published on arXiv

2512.03079

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

WET achieves near-perfect watermark verifiability against paraphrasing attacks that successfully evade prior EaaS watermarking schemes across diverse attack configurations.

WET (Watermarking EaaS with Linear Transformation)

Novel technique introduced


Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs, businesses have started to provide Embeddings-as-a-Service (EaaS), offering feature extraction capabilities (in the form of text embeddings) that benefit downstream natural language processing tasks. However, prior research has demonstrated that EaaS is vulnerable to imitation attacks, where an attacker clones the service's model in a black-box manner without access to the model's internal workings. In response, watermarks have been added to the text embeddings to protect the intellectual property of EaaS providers by allowing them to check for model ownership. This thesis focuses on defending against imitation attacks by investigating EaaS watermarks. To achieve this goal, we unveil novel attacks and propose and validate new watermarking techniques. Firstly, we show that existing EaaS watermarks can be removed through paraphrasing the input text when attackers clone the model during imitation attacks. Our study illustrates that paraphrasing can effectively bypass current state-of-the-art EaaS watermarks across various attack setups (including different paraphrasing techniques and models) and datasets in most instances. This demonstrates a new vulnerability in recent EaaS watermarking techniques. Subsequently, as a countermeasure, we propose a novel watermarking technique, WET (Watermarking EaaS with Linear Transformation), which employs linear transformation of the embeddings. Watermark verification is conducted by applying a reverse transformation and comparing the similarity between recovered and original embeddings. We demonstrate its robustness against paraphrasing attacks with near-perfect verifiability. We conduct detailed ablation studies to assess the significance of each component and hyperparameter in WET.


Key Contributions

  • Paraphrasing attack that bypasses state-of-the-art EaaS watermarks across multiple attack setups and datasets during imitation attacks
  • WET (Watermarking EaaS with Linear Transformation): a novel watermarking scheme using linear transformation of embeddings with reverse-transform verification, robust against paraphrasing attacks with near-perfect verifiability
  • Ablation studies characterizing the role of each WET component and hyperparameter in watermark robustness

🛡️ Threat Analysis

Model Theft

Central threat is imitation attacks (black-box model cloning) against Embeddings-as-a-Service providers. Watermarks embedded in embeddings serve to verify MODEL OWNERSHIP after theft, not to track content provenance — this is IP protection of the model. Paper both breaks existing watermarks and proposes WET as a robust ownership-verification defense.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
embeddings-as-a-servicetext embeddingsnlp apis