SEAL: Subspace-Anchored Watermarks for LLM Ownership
Yanbo Dai , Zongjie Li , Zhenlan Ji , Shuai Wang
Published on arXiv
2511.11356
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
SEAL maintains strong ownership verification performance even when adversaries possess full knowledge of the watermarking mechanism and embedded signatures, outperforming 11 existing fingerprinting and watermarking methods in effectiveness, fidelity, efficiency, and robustness.
SEAL
Novel technique introduced
Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level performance in text generation, reasoning, and question answering. However, training such models requires substantial computational resources, large curated datasets, and sophisticated alignment procedures. As a result, they constitute highly valuable intellectual property (IP) assets that warrant robust protection mechanisms. Existing IP protection approaches suffer from critical limitations. Model fingerprinting techniques can identify model architectures but fail to establish ownership of specific model instances. In contrast, traditional backdoor-based watermarking methods embed behavioral anomalies that can be easily removed through common post-processing operations such as fine-tuning or knowledge distillation. We propose SEAL, a subspace-anchored watermarking framework that embeds multi-bit signatures directly into the model's latent representational space, supporting both white-box and black-box verification scenarios. Our approach leverages model editing techniques to align the hidden representations of selected anchor samples with predefined orthogonal bit vectors. This alignment embeds the watermark while preserving the model's original factual predictions, rendering the watermark functionally harmless and stealthy. We conduct comprehensive experiments on multiple benchmark datasets and six prominent LLMs, comparing SEAL with 11 existing fingerprinting and watermarking methods to demonstrate its superior effectiveness, fidelity, efficiency, and robustness. Furthermore, we evaluate SEAL under potential knowledgeable attacks and show that it maintains strong verification performance even when adversaries possess knowledge of the watermarking mechanism and the embedded signatures.
Key Contributions
- SEAL watermarking framework that aligns hidden representations of anchor samples with predefined orthogonal bit vectors, embedding multi-bit ownership signatures in LLM latent space
- Supports both white-box and black-box verification, making the watermark functionally stealthy while preserving factual model predictions
- Demonstrated robustness against 11 competing fingerprinting/watermarking baselines and knowledgeable adversaries who know both the mechanism and the embedded signatures across six LLMs
🛡️ Threat Analysis
SEAL embeds multi-bit signatures directly into the model's latent representational space (model weights/hidden states) to prove ownership of a stolen LLM — classic model watermarking as a defense against model theft. Supports both white-box and black-box verification and is evaluated against adversaries attempting to remove or defeat the watermark.