defense 2025

Provably Secure Retrieval-Augmented Generation

Pengcheng Zhou , Yinglun Feng , Zhongliang Yang

0 citations

α

Published on arXiv

2508.01084

Data Poisoning Attack

OWASP ML Top 10 — ML02

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

The SAG framework provides formally proven confidentiality and integrity guarantees for RAG systems, effectively resisting data leakage and poisoning attacks across multiple benchmark datasets while maintaining retrieval and generation performance.

SAG (Secure Augmented Generation)

Novel technique introduced


Although Retrieval-Augmented Generation (RAG) systems have been widely applied, the privacy and security risks they face, such as data leakage and data poisoning, have not been systematically addressed yet. Existing defense strategies primarily rely on heuristic filtering or enhancing retriever robustness, which suffer from limited interpretability, lack of formal security guarantees, and vulnerability to adaptive attacks. To address these challenges, this paper proposes the first provably secure framework for RAG systems(SAG). Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings, guaranteeing that only authorized entities can access the data. Through formal security proofs, we rigorously verify the scheme's confidentiality and integrity under a computational security model. Extensive experiments across multiple benchmark datasets demonstrate that our framework effectively resists a range of state-of-the-art attacks. This work establishes a theoretical foundation and practical paradigm for verifiably secure RAG systems, advancing AI-powered services toward formally guaranteed security.


Key Contributions

  • First provably secure RAG framework (SAG) with formal confidentiality and integrity proofs under a computational security model
  • Pre-storage full-encryption scheme protecting both retrieved text content and vector embeddings, ensuring only authorized entities can access or modify the knowledge base
  • Empirical validation across diverse benchmark datasets showing effective resistance to state-of-the-art RAG attacks while preserving retrieval precision and generation quality

🛡️ Threat Analysis

Data Poisoning Attack

Paper explicitly defends against data poisoning of the RAG knowledge base — malicious corpus providers injecting false or manipulative content — through cryptographic integrity verification with formal security proofs.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timetraining_timewhite_box
Datasets
Enron EmailHealthCareMagic 100kBillSumFnspid
Applications
retrieval-augmented generationllm knowledge basesmedical consultationfinancial advisinglegal research