Geometry-Aware Localized Watermarking for Copyright Protection in Embedding-as-a-Service

Embedding-as-a-Service (EaaS) has become an important semantic infrastructure for natural language and multimedia applications, but it is highly vulnerable to model stealing and copyright infringement. Existing EaaS watermarking methods face a fundamental robustness--utility--verifiability tension: trigger-based methods are fragile to paraphrasing, transformation-based methods are sensitive to dimensional perturbation, and region-based methods may incur false positives due to coincidental geometric affinity. To address this problem, we propose GeoMark, a geometry-aware localized watermarking framework for EaaS copyright protection. GeoMark uses a natural in-manifold embedding as a shared watermark target, constructs geometry-separated anchors with explicit target--anchor margins, and activates watermark injection only within adaptive local neighborhoods. This design decouples where watermarking is triggered from what ownership is attributed to, achieving localized triggering and centralized attribution. Experiments on four benchmark datasets show that GeoMark preserves downstream utility and geometric fidelity while maintaining robust copyright verification under paraphrasing, dimensional perturbation, and CSE (Clustering, Selection, Elimination) attacks, with improved verification stability and low false-positive risk.

Key Contributions

Geometry-aware localized watermarking framework (GeoMark) that decouples watermark triggering from ownership attribution
Uses natural in-manifold embeddings as shared watermark targets with geometry-separated anchors
Achieves robustness against paraphrasing, dimensional perturbation, and CSE attacks while reducing false-positive risks

🛡️ Threat Analysis

Model Theft

The paper addresses model theft of Embedding-as-a-Service (EaaS) models. The watermarking scheme is designed to prove ownership of the embedding model itself when it is stolen via API queries. The watermark is embedded in the model's behavior (its embedding outputs) to verify model ownership, not to track content provenance. This is model IP protection against model stealing attacks.

Details

Domains

nlpmultimodal

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

2025 0 cit.

Model Theft

67%