defense 2025

SEAL: Subspace-Anchored Watermarks for LLM Ownership

Yanbo Dai , Zongjie Li , Zhenlan Ji , Shuai Wang

The Hong Kong University of Science and Technology

0 citations · arXiv

Published on arXiv

2511.11356

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

SEAL maintains strong ownership verification performance even when adversaries possess full knowledge of the watermarking mechanism and embedded signatures, outperforming 11 existing fingerprinting and watermarking methods in effectiveness, fidelity, efficiency, and robustness.

SEAL

Novel technique introduced

Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level performance in text generation, reasoning, and question answering. However, training such models requires substantial computational resources, large curated datasets, and sophisticated alignment procedures. As a result, they constitute highly valuable intellectual property (IP) assets that warrant robust protection mechanisms. Existing IP protection approaches suffer from critical limitations. Model fingerprinting techniques can identify model architectures but fail to establish ownership of specific model instances. In contrast, traditional backdoor-based watermarking methods embed behavioral anomalies that can be easily removed through common post-processing operations such as fine-tuning or knowledge distillation. We propose SEAL, a subspace-anchored watermarking framework that embeds multi-bit signatures directly into the model's latent representational space, supporting both white-box and black-box verification scenarios. Our approach leverages model editing techniques to align the hidden representations of selected anchor samples with predefined orthogonal bit vectors. This alignment embeds the watermark while preserving the model's original factual predictions, rendering the watermark functionally harmless and stealthy. We conduct comprehensive experiments on multiple benchmark datasets and six prominent LLMs, comparing SEAL with 11 existing fingerprinting and watermarking methods to demonstrate its superior effectiveness, fidelity, efficiency, and robustness. Furthermore, we evaluate SEAL under potential knowledgeable attacks and show that it maintains strong verification performance even when adversaries possess knowledge of the watermarking mechanism and the embedded signatures.

Key Contributions

SEAL watermarking framework that aligns hidden representations of anchor samples with predefined orthogonal bit vectors, embedding multi-bit ownership signatures in LLM latent space
Supports both white-box and black-box verification, making the watermark functionally stealthy while preserving factual model predictions
Demonstrated robustness against 11 competing fingerprinting/watermarking baselines and knowledgeable adversaries who know both the mechanism and the embedded signatures across six LLMs

🛡️ Threat Analysis

Model Theft

SEAL embeds multi-bit signatures directly into the model's latent representational space (model weights/hidden states) to prove ownership of a stolen LLM — classic model watermarking as a defense against model theft. Supports both white-box and black-box verification and is evaluated against adversaries attempting to remove or defeat the watermark.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxblack_boxtraining_time

Datasets

multiple benchmark datasets (unspecified in excerpt)

Applications

llm ip protectionmodel ownership verification

Read PDF arXiv DOI

SEAL: Subspace-Anchored Watermarks for LLM Ownership

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing

KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor

Antidistillation Fingerprinting