defense 2025

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

0 citations · arXiv

Published on arXiv

2511.10128

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

RAGFort significantly reduces knowledge base reconstruction success across both intra-class and inter-class extraction paths while maintaining answer quality in RAG systems.

RAGFort

Novel technique introduced

Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths, progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining "contrastive reindexing" for inter-class isolation and "constrained cascade generation" for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering comprehensive defense against knowledge base extraction attacks.

Key Contributions

Systematic analysis showing that defending only one extraction path (intra-class or inter-class) is insufficient, and joint protection is essential
Contrastive reindexing module that reorganizes the dense retrieval index using HDBSCAN clustering to enforce semantic separation between topic classes, preventing inter-class extraction
Constrained cascade generation module for intra-class protection that reduces fine-grained content leakage within individual topics while preserving answer quality

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

retrieval-augmented generationproprietary knowledge base protectionquestion answering

Read PDF arXiv DOI Code

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Privacy Preserving In-Context-Learning Framework for Large Language Models