defense arXiv Nov 13, 2025 · Nov 2025
Qinfeng Li, Miao Pan, Jintao Chen et al. · Zhejiang University · Ningbo Global Innovation Center +2 more
Defends open-source LLMs from unauthorized model merging by disrupting Linear Mode Connectivity between homologous model weights
Model Theft Model Theft nlp
Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model merging stealing, where free-riders exploit models through unauthorized model merging. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify three critical protection properties that existing methods fail to simultaneously satisfy: (1) proactively preventing unauthorized merging; (2) ensuring compatibility with general open-source settings; (3) achieving high security with negligible performance loss. To address the above issues, we propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging. The core design of MergeBarrier is to disrupt the Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, thereby eliminating the low-loss path required for effective model merging. Extensive experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.
llm transformer Zhejiang University · Ningbo Global Innovation Center · Ant Group +1 more
defense arXiv Nov 13, 2025 · Nov 2025
Qinfeng Li, Miao Pan, Ke Xiong et al. · Zhejiang University · Ant Group +3 more
Defends RAG systems against proprietary knowledge base extraction attacks using dual-path contrastive reindexing and constrained cascade generation
Sensitive Information Disclosure nlp
Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths, progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining "contrastive reindexing" for inter-class isolation and "constrained cascade generation" for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering comprehensive defense against knowledge base extraction attacks.
llm transformer Zhejiang University · Ant Group · Universal Identification Technology +2 more