benchmark 2026

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Zhisheng Qi 1, Utkarsh Sahu 1, Li Ma 2, Haoyu Han 2, Ryan Rossi 3, Franck Dernoncourt 3, Mahantesh Halappanavar 4, Nesreen Ahmed 5, Yushun Dong 6, Yue Zhao 7, Yu Zhang 8, Yu Wang 1

0 citations · 55 references · arXiv (Cornell University)

α

Published on arXiv

2602.09319

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

The benchmark consolidates the fragmented RAG knowledge-extraction landscape into a unified framework, yielding actionable insights for building privacy-preserving RAG systems.


Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications, including enterprise chatbots, healthcare assistants, and agentic memory management. However, recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries, raising serious concerns about intellectual property theft and privacy leakage. While prior work has explored individual attack and defense techniques, the research landscape remains fragmented, spanning heterogeneous retrieval embeddings, diverse generation models, and evaluations based on non-standardized metrics and inconsistent datasets. To address this gap, we introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems. Our benchmark covers a broad spectrum of attack and defense strategies, representative retrieval embedding models, and both open- and closed-source generators, all evaluated under a unified experimental framework with standardized protocols across multiple datasets. By consolidating the experimental landscape and enabling reproducible, comparable evaluation, this benchmark provides actionable insights and a practical foundation for developing privacy-preserving RAG systems in the face of emerging knowledge extraction threats. Our code is available here.


Key Contributions

  • First systematic benchmark for knowledge-extraction attacks and defenses on RAG systems, unifying previously fragmented evaluation landscapes
  • Covers a broad spectrum of attack and defense strategies across heterogeneous retrieval embedding models and open-/closed-source generators
  • Standardized experimental protocols across multiple datasets enabling reproducible and comparable evaluation of RAG privacy threats

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
enterprise chatbotshealthcare assistantsagentic memory managementretrieval-augmented generation