Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion
Published on arXiv
2602.08668
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
Undefended hybrid RAG achieves RPR up to 0.95 with ~160–194× amplification over vector-only retrieval; per-hop authorization at the graph expansion boundary alone reduces RPR to 0.0 across all corpora and attack variants
Retrieval Pivot Attack / Retrieval Pivot Risk (RPR)
Novel technique introduced
Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We show that this composition introduces a distinct security failure mode: a vector-retrieved "seed" chunk can pivot via entity links into sensitive graph neighborhoods, causing cross-tenant data leakage that does not occur in vector-only retrieval. We formalize this risk as Retrieval Pivot Risk (RPR) and introduce companion metrics Leakage@k, Amplification Factor, and Pivot Depth (PD) to quantify leakage magnitude and traversal structure. We present seven Retrieval Pivot Attacks that exploit the vector-to-graph boundary and show that adversarial injection is not required: naturally shared entities create cross-tenant pivot paths organically. Across a synthetic multi-tenant enterprise corpus and the Enron email corpus, the undefended hybrid pipeline exhibits high pivot risk (RPR up to 0.95) with multiple unauthorized items returned per query. Leakage consistently appears at PD=2, which we attribute to the bipartite chunk-entity topology and formalize as a proposition. We then show that enforcing authorization at a single location, the graph expansion boundary, eliminates measured leakage (RPR near 0) across both corpora, all attack variants, and label forgery rates up to 10 percent, with minimal overhead. Our results indicate the root cause is boundary enforcement, not inherently complex defenses: two individually secure retrieval components can compose into an insecure system unless authorization is re-checked at the transition point.
Key Contributions
- Formalizes Retrieval Pivot Risk (RPR) and companion metrics (Leakage@k, Amplification Factor, Pivot Depth) to quantify cross-tenant leakage probability, magnitude, and traversal structure in hybrid RAG pipelines
- Presents Retrieval Pivot Attack taxonomy exploiting the vector-to-graph boundary, demonstrating that adversarial injection is unnecessary — organically shared entities create cross-tenant pivot paths (RPR up to 0.95 on synthetic corpus, 0.70 on Enron) at a structural invariant of PD=2 hops
- Shows that per-hop authorization enforced solely at the graph expansion boundary (D1) eliminates all measured leakage (RPR → 0.0) across three corpora and all attack variants with negligible latency overhead, framing the root cause as boundary enforcement rather than defense complexity