benchmark 2026

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Zhisheng Qi ¹, Utkarsh Sahu ¹, Li Ma ², Haoyu Han ², Ryan Rossi ³, Franck Dernoncourt ³, Mahantesh Halappanavar ⁴, Nesreen Ahmed ⁵, Yushun Dong ⁶, Yue Zhao ⁷, Yu Zhang ⁸, Yu Wang ¹

¹ University of Oregon

² Michigan State University

³ Adobe Research

⁴ PNNL

⁵ Cisco AI Research

⁶ Florida State University

⁷ University of Southern California

⁸ Texas A&M University

0 citations · 55 references · arXiv (Cornell University)

Published on arXiv

2602.09319

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

The benchmark consolidates the fragmented RAG knowledge-extraction landscape into a unified framework, yielding actionable insights for building privacy-preserving RAG systems.

Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications, including enterprise chatbots, healthcare assistants, and agentic memory management. However, recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries, raising serious concerns about intellectual property theft and privacy leakage. While prior work has explored individual attack and defense techniques, the research landscape remains fragmented, spanning heterogeneous retrieval embeddings, diverse generation models, and evaluations based on non-standardized metrics and inconsistent datasets. To address this gap, we introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems. Our benchmark covers a broad spectrum of attack and defense strategies, representative retrieval embedding models, and both open- and closed-source generators, all evaluated under a unified experimental framework with standardized protocols across multiple datasets. By consolidating the experimental landscape and enabling reproducible, comparable evaluation, this benchmark provides actionable insights and a practical foundation for developing privacy-preserving RAG systems in the face of emerging knowledge extraction threats. Our code is available here.

Key Contributions

First systematic benchmark for knowledge-extraction attacks and defenses on RAG systems, unifying previously fragmented evaluation landscapes
Covers a broad spectrum of attack and defense strategies across heterogeneous retrieval embedding models and open-/closed-source generators
Standardized experimental protocols across multiple datasets enabling reproducible and comparable evaluation of RAG privacy threats

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

enterprise chatbotshealthcare assistantsagentic memory managementretrieval-augmented generation

Read PDF arXiv DOI Code

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries

Understanding the Dilemma of Unlearning for Large Language Models

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference