survey arXiv Aug 15, 2025 · Aug 2025
Zhenhua Xu, Xubin Yue, Zhebo Wang et al. · Zhejiang University · GenTel.io
Surveys LLM copyright protection: text watermarking, model fingerprinting, fingerprint transfer/removal, and IP ownership verification
Model Theft Output Integrity Attack Model Theft nlp
Copyright protection for large language models is of critical importance, given their substantial development costs, proprietary value, and potential for misuse. Existing surveys have predominantly focused on techniques for tracing LLM-generated content-namely, text watermarking-while a systematic exploration of methods for protecting the models themselves (i.e., model watermarking and model fingerprinting) remains absent. Moreover, the relationships and distinctions among text watermarking, model watermarking, and model fingerprinting have not been comprehensively clarified. This work presents a comprehensive survey of the current state of LLM copyright protection technologies, with a focus on model fingerprinting, covering the following aspects: (1) clarifying the conceptual connection from text watermarking to model watermarking and fingerprinting, and adopting a unified terminology that incorporates model watermarking into the broader fingerprinting framework; (2) providing an overview and comparison of diverse text watermarking techniques, highlighting cases where such methods can function as model fingerprinting; (3) systematically categorizing and comparing existing model fingerprinting approaches for LLM copyright protection; (4) presenting, for the first time, techniques for fingerprint transfer and fingerprint removal; (5) summarizing evaluation metrics for model fingerprints, including effectiveness, harmlessness, robustness, stealthiness, and reliability; and (6) discussing open challenges and future research directions. This survey aims to offer researchers a thorough understanding of both text watermarking and model fingerprinting technologies in the era of LLMs, thereby fostering further advances in protecting their intellectual property.
llm transformer Zhejiang University · GenTel.io
defense arXiv Aug 31, 2025 · Aug 2025
Zhenhua Xu, Zhaokun Yan, Binhan Xu et al. · Zhejiang University · China Academy of Information and Communications Technology +3 more
Embeds backdoor ownership fingerprints into LoRA adapters for lightweight, transferable LLM IP protection across downstream models
Model Theft Model Theft nlp
With the rapid advancement of large language models (LLMs), safeguarding intellectual property (IP) has become increasingly critical. To address the challenges of high costs and potential contamination in fingerprint integration, we propose LoRA-FP, a lightweight, plug-and-play framework that embeds backdoor fingerprints into LoRA adapters through constrained fine-tuning. This design enables seamless fingerprint transplantation via parameter fusion, eliminating the need for full-parameter updates while preserving model integrity. Experimental results demonstrate that LoRA-FP not only significantly reduces computational overhead compared to conventional approaches but also achieves superior robustness across diverse scenarios, including incremental training and model fusion. Our code and datasets are publicly available at https://github.com/Xuzhenhua55/LoRA-FP.
llm transformer Zhejiang University · China Academy of Information and Communications Technology · GenTel.io +2 more
defense arXiv Sep 3, 2025 · Sep 2025
Zhenhua Xu, Meng Han, Wenpeng Xing · Zhejiang University · GenTel.io
Detects stolen LLMs via memorization-based probabilistic fingerprints that remain stealthy and robust under gray-box API access
Model Theft Model Theft nlp
The proliferation of large language models (LLMs) has intensified concerns over model theft and license violations, necessitating robust and stealthy ownership verification. Existing fingerprinting methods either require impractical white-box access or introduce detectable statistical anomalies. We propose EverTracer, a novel gray-box fingerprinting framework that ensures stealthy and robust model provenance tracing. EverTracer is the first to repurpose Membership Inference Attacks (MIAs) for defensive use, embedding ownership signals via memorization instead of artificial trigger-output overfitting. It consists of Fingerprint Injection, which fine-tunes the model on any natural language data without detectable artifacts, and Verification, which leverages calibrated probability variation signal to distinguish fingerprinted models. This approach remains robust against adaptive adversaries, including input level modification, and model-level modifications. Extensive experiments across architectures demonstrate EverTracer's state-of-the-art effectiveness, stealthness, and resilience, establishing it as a practical solution for securing LLM intellectual property. Our code and data are publicly available at https://github.com/Xuzhenhua55/EverTracer.
llm transformer Zhejiang University · GenTel.io
defense arXiv Sep 5, 2025 · Sep 2025
Zhenhua Xu, Xixiang Zhao, Xubin Yue et al. · Zhejiang University · The Hong Kong Polytechnic University +1 more
Embeds verifiable LLM ownership fingerprints via multi-turn contextual backdoors resistant to perplexity detection and adversarial fine-tuning
Model Theft Model Theft nlp
The widespread deployment of large language models (LLMs) has intensified concerns around intellectual property (IP) protection, as model theft and unauthorized redistribution become increasingly feasible. To address this, model fingerprinting aims to embed verifiable ownership traces into LLMs. However, existing methods face inherent trade-offs between stealthness, robustness, and generalizability, being either detectable via distributional shifts, vulnerable to adversarial modifications, or easily invalidated once the fingerprint is revealed. In this work, we introduce CTCC, a novel rule-driven fingerprinting framework that encodes contextual correlations across multiple dialogue turns, such as counterfactual, rather than relying on token-level or single-turn triggers. CTCC enables fingerprint verification under black-box access while mitigating false positives and fingerprint leakage, supporting continuous construction under a shared semantic rule even if partial triggers are exposed. Extensive experiments across multiple LLM architectures demonstrate that CTCC consistently achieves stronger stealth and robustness than prior work. Our findings position CTCC as a reliable and practical solution for ownership verification in real-world LLM deployment scenarios. Our code and data are publicly available at <https://github.com/Xuzhenhua55/CTCC>.
llm transformer Zhejiang University · The Hong Kong Polytechnic University · GenTel.io
defense arXiv Aug 31, 2025 · Aug 2025
Xubin Yue, Zhenhua Xu, Wenpeng Xing et al. · Zhejiang University · GenTel.io +1 more
Embeds ownership fingerprints in LLM parameter offsets via dual-channel knowledge editing, resisting fine-tuning erasure and feature-space defenses
Model Theft Model Theft nlp
Addressing the intellectual property protection challenges in commercial deployment of large language models (LLMs), existing black-box fingerprinting techniques face dual challenges from incremental fine-tuning erasure and feature-space defense due to their reliance on overfitting high-perplexity trigger patterns. Recent work has revealed that model editing in the fingerprinting domain offers distinct advantages, including significantly lower false positive rates, enhanced harmlessness, and superior robustness. Building on this foundation, this paper innovatively proposes a $\textbf{Pr}$efix-$\textbf{e}$nhanced Fingerprint $\textbf{E}$diting Framework (PREE), which encodes copyright information into parameter offsets through dual-channel knowledge edit to achieve covert embedding of fingerprint features. Experimental results demonstrate that the proposed solution achieves the 90\% trigger precision in mainstream architectures including LLaMA-3 and Qwen-2.5. The minimal parameter offset (change rate < 0.03) effectively preserves original knowledge representation while demonstrating strong robustness against incremental fine-tuning and multi-dimensional defense strategies, maintaining zero false positive rate throughout evaluations.
llm transformer Zhejiang University · GenTel.io · Guangzhou University
attack arXiv Sep 1, 2025 · Sep 2025
Dezhang Kong, Hujin Peng, Yilun Zhang et al. · Zhejiang University · Changsha University of Science and Technology +4 more
Attacks LLM multi-agent systems via manipulated web links using homoglyph, subdirectory, and obfuscation techniques
Insecure Plugin Design Excessive Agency nlp
With the proliferation of LLM-driven multi-agent systems (MAS), the security of Web links has become a critical concern. Once MAS is induced to trust a malicious link, attackers can use it as a springboard to expand the attack surface. In this paper, we propose Web Fraud Attacks, a novel type of attack manipulating unique structures of web links to deceive MAS. We design 12 representative attack variants that encompass various methods, such as homoglyph deception, sub-directory nesting, and parameter obfuscation. Through extensive experiments on these attack vectors, we demonstrate that Web fraud attacks not only exhibit significant destructive potential across different MAS architectures but also possess a distinct advantage in evasion: they circumvent the need for complex input design, lowering the threshold for attacks significantly. These results underscore the importance of addressing Web fraud attacks, providing new insights into MAS safety. Our code is available at https://github.com/JiangYingEr/Web-Fraud-Attack-in-MAS.
llm Zhejiang University · Changsha University of Science and Technology · Purdue University +3 more