defense 2025

Provably Secure Retrieval-Augmented Generation

Pengcheng Zhou , Yinglun Feng , Zhongliang Yang

Beijing University of Posts and Telecommunications

0 citations

Published on arXiv

2508.01084

Data Poisoning Attack

OWASP ML Top 10 — ML02

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

The SAG framework provides formally proven confidentiality and integrity guarantees for RAG systems, effectively resisting data leakage and poisoning attacks across multiple benchmark datasets while maintaining retrieval and generation performance.

SAG (Secure Augmented Generation)

Novel technique introduced

Although Retrieval-Augmented Generation (RAG) systems have been widely applied, the privacy and security risks they face, such as data leakage and data poisoning, have not been systematically addressed yet. Existing defense strategies primarily rely on heuristic filtering or enhancing retriever robustness, which suffer from limited interpretability, lack of formal security guarantees, and vulnerability to adaptive attacks. To address these challenges, this paper proposes the first provably secure framework for RAG systems(SAG). Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings, guaranteeing that only authorized entities can access the data. Through formal security proofs, we rigorously verify the scheme's confidentiality and integrity under a computational security model. Extensive experiments across multiple benchmark datasets demonstrate that our framework effectively resists a range of state-of-the-art attacks. This work establishes a theoretical foundation and practical paradigm for verifiably secure RAG systems, advancing AI-powered services toward formally guaranteed security.

Key Contributions

First provably secure RAG framework (SAG) with formal confidentiality and integrity proofs under a computational security model
Pre-storage full-encryption scheme protecting both retrieved text content and vector embeddings, ensuring only authorized entities can access or modify the knowledge base
Empirical validation across diverse benchmark datasets showing effective resistance to state-of-the-art RAG attacks while preserving retrieval precision and generation quality

🛡️ Threat Analysis

Data Poisoning Attack

Paper explicitly defends against data poisoning of the RAG knowledge base — malicious corpus providers injecting false or manipulative content — through cryptographic integrity verification with formal security proofs.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timetraining_timewhite_box

Datasets

Enron EmailHealthCareMagic 100kBillSumFnspid

Applications

retrieval-augmented generationllm knowledge basesmedical consultationfinancial advisinglegal research

Read PDF arXiv

Provably Secure Retrieval-Augmented Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection