Shuo Shao

attack arXiv Oct 3, 2025 · Oct 2025

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Proposes SECRET, an adaptive jailbreak-plus-retrieval-trigger attack that extracts RAG knowledge base contents verbatim from leading commercial LLMs

Sensitive Information Disclosure Prompt Injection nlp

1 citations PDF

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

llm transformer Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security · Nanyang Technological University

PDF arXiv DOI

attack arXiv Nov 6, 2025 · Nov 2025

Black-Box Guardrail Reverse-engineering Attack

Hongwei Yao, Yun Xia, Shuo Shao et al. · City University of Hong Kong · Hangzhou Dianzi University +1 more

Clones black-box LLM guardrail policies via RL and genetic algorithms, achieving 0.92 fidelity for under $85 in API queries

Model Theft Model Theft nlp

PDF

attack arXiv Dec 9, 2025 · Dec 2025

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Tailun Chen, Yu He, Yan Wang et al. · Zhejiang University · Alibaba Group +1 more

Black-box RAG corpus poisoning attack using persona-driven query synthesis, semantic anchoring, and adversarial preference optimization to mislead LLMs

Data Poisoning Attack Prompt Injection nlp

PDF

Papers in Database (3)

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Black-Box Guardrail Reverse-engineering Attack

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks