defense 2025

EditMark: Watermarking Large Language Models based on Model Editing

Shuai Li ¹, Kejiang Chen ¹, Jun Jiang ¹, Jie Zhang ², Qiyi Yao ¹, Kai Zeng ³, Weiming Zhang ¹, Nenghai Yu ¹

¹ University of Science and Technology of China

² A*STAR

³ University of Siena

0 citations · 48 references · arXiv

Published on arXiv

2510.16367

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Achieves 100% watermark extraction success rate for 32-bit watermarks embedded in LLMs within 20 seconds while maintaining model fidelity and stealthiness against common attacks.

EditMark

Novel technique introduced

Large Language Models (LLMs) have demonstrated remarkable capabilities, but their training requires extensive data and computational resources, rendering them valuable digital assets. Therefore, it is essential to watermark LLMs to protect their copyright and trace unauthorized use or resale. Existing methods for watermarking LLMs primarily rely on training LLMs with a watermarked dataset, which entails burdensome training costs and negatively impacts the LLM's performance. In addition, their watermarked texts are not logical or natural, thereby reducing the stealthiness of the watermark. To address these issues, we propose EditMark, the first watermarking method that leverages model editing to embed a training-free, stealthy, and performance-lossless watermark for LLMs. We observe that some questions have multiple correct answers. Therefore, we assign each answer a unique watermark and update the weights of LLMs to generate corresponding questions and answers through the model editing technique. In addition, we refine the model editing technique to align with the requirements of watermark embedding. Specifically, we introduce an adaptive multi-round stable editing strategy, coupled with the injection of a noise matrix, to improve both the effectiveness and robustness of the watermark embedding. Extensive experiments indicate that EditMark can embed 32-bit watermarks into LLMs within 20 seconds (Fine-tuning: 6875 seconds) with a watermark extraction success rate of 100%, which demonstrates its effectiveness and efficiency. External experiments further demonstrate that EditMark has fidelity, stealthiness, and a certain degree of robustness against common attacks.

Key Contributions

First training-free LLM watermarking method using model editing — embeds 32-bit watermarks in ~20 seconds vs. 6875 seconds for fine-tuning baselines
Adaptive multi-round stable editing strategy with noise matrix injection to improve watermark robustness and effectiveness
Leverages multiple-answer (MA) questions so watermarked outputs remain logically coherent, achieving stealthiness and performance-losslessness simultaneously

🛡️ Threat Analysis

Model Theft

EditMark watermarks the MODEL itself — weights are modified via model editing to embed a verifiable ownership signal — explicitly to protect LLM intellectual property against unauthorized resale and redistribution. The watermark is in the model weights, not in generated text outputs, making this a model ownership protection scheme (ML05), not output provenance (ML09).

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_time

Applications

llm copyright protectionmodel provenance trackingunauthorized redistribution detection

Read PDF arXiv DOI

EditMark: Watermarking Large Language Models based on Model Editing

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

SEAL: Subspace-Anchored Watermarks for LLM Ownership

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

Functional Subspace Watermarking for Large Language Models