attack 2025

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He ^1,2, Yifei Chen ^1,2, Yiming Li ³, Shuo Shao ^1,2, Leyi Qi ^1,2, Boheng Li ³, Dacheng Tao ³, Zhan Qin ^1,2

¹ Zhejiang University

² Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

³ Nanyang Technological University

1 citations · 82 references · arXiv

Published on arXiv

2510.02964

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

SECRET extracts 35% of documents from RAG powered by Claude 3.7 Sonnet while all prior attacks achieve 0% extraction, demonstrating practical feasibility of knowledge-base exfiltration

SECRET

Novel technique introduced

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

Key Contributions

First formalized framework for external data extraction attacks (EDEAs) decomposing adversarial query design into three components: extraction instruction, jailbreak operator, and retrieval trigger
SECRET attack combining adaptive LLM-as-optimizer jailbreak generation with cluster-focused triggering that alternates global exploration and local exploitation to maximize knowledge base coverage
Comprehensive evaluation across 16 RAG instances and 4 models (including Claude 3.7 Sonnet), achieving 35% extraction where all prior attacks yield 0%

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeted

Datasets

16 custom RAG instances spanning medical, financial, and enterprise document domains

Applications

rag systemsllm applications with private knowledge basesenterprise ai assistantsmedical and financial domain llms

Read PDF arXiv DOI

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

Tricking LLM-Based NPCs into Spilling Secrets

Bypassing Prompt Guards in Production with Controlled-Release Prompting

EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System

CLIOPATRA: Extracting Private Information from LLM Insights

Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace