defense 2026

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Aiman Al Masoud , Marco Arazzi , Antonino Nocera

1 citations · 23 references · arXiv

α

Published on arXiv

2601.11199

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves up to 58% improvement in privacy score over baseline RAG approaches while maintaining strong resilience to prompt injection attacks targeting the generative model

SD-RAG

Novel technique introduced


Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections. Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model. Only a few approaches propose techniques to instruct the generative model to refrain from disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of security and privacy constraints from the generation process itself. Rather than relying on prompt-level safeguards, SD-RAG applies sanitization and disclosure controls during the retrieval phase, prior to augmenting the language model's input. Moreover, we introduce a semantic mechanism to allow the ingestion of human-readable dynamic security and privacy constraints together with an optimized graph-based data model that supports fine-grained, policy-aware retrieval. Our experimental evaluation demonstrates the superiority of SD-RAG over baseline existing approaches, achieving up to a $58\%$ improvement in the privacy score, while also showing a strong resilience to prompt injection attacks targeting the generative model.


Key Contributions

  • SD-RAG framework that decouples security/privacy enforcement from LLM generation by sanitizing retrieved content before it reaches the generative model, providing prompt-injection resilience
  • Graph-based data model that co-represents corpus content and human-readable dynamic security/privacy constraints, enabling fine-grained, policy-aware retrieval
  • Automated evaluation methodology and novel metrics for redaction-aware closed-question answering (RCQA) to assess privacy preservation and utility

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_time
Datasets
custom RCQA evaluation dataset
Applications
retrieval-augmented generationenterprise knowledge managementprivacy-sensitive llm deployments