attack 2026

Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

Hanxi Li 1, Jianan Zhou 1, Jiale Lao 1,2, Yibo Wang 3, Zhengmao Ye 1, Yang Cao 4, Junfen Wang 1, Mingjie Tang 1

0 citations

α

Published on arXiv

2604.05480

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Injecting 1% malicious vectors into a database with 100K entries contaminates 50-90% of top-10 results, with up to 99.85% attack success rate

Black-Hole Attack

Novel technique introduced


Vector databases serve as the retrieval backbone of modern AI applications, yet their security remains largely unexplored. We propose the Black-Hole Attack, a poisoning attack that injects a small number of malicious vectors near the geometric center of the stored vectors. These injected vectors attract queries like a black hole and frequently appear in the top-k retrieval results for most queries. This attack is enabled by a phenomenon we term centrality-driven hubness: in high-dimensional embedding spaces, vectors near the centroid become nearest neighbors of a disproportionately large number of other vectors, while this centroid region is nearly empty in practice. The attack shows that vectors in a vector database cannot be blindly trusted: geometric defects in high-dimensional embeddings make retrieval inherently vulnerable. Our experiments show that malicious vectors appear in up to 99.85% of top-10 results. Additionally, we evaluate existing hubness mitigation methods as potential defenses against the Black-Hole Attack. The results show that these methods either significantly reduce retrieval accuracy or provide limited protection, which indicates the need for more robust defenses against the Black-Hole Attack.


Key Contributions

  • Discovers centrality-driven hubness phenomenon in high-dimensional embedding spaces where vectors near the centroid become nearest neighbors of disproportionately many queries
  • Proposes Black-Hole Attack that injects only 1% malicious vectors to contaminate 50-90% of top-10 retrieval results
  • Evaluates existing hubness mitigation methods as defenses and finds them inadequate

🛡️ Threat Analysis

Data Poisoning Attack

The Black-Hole Attack is a data poisoning attack that injects malicious vectors into the vector database at inference time, corrupting retrieval results by exploiting geometric defects in high-dimensional embedding spaces. The attack vector is the injected data itself.


Details

Domains
nlpmultimodal
Model Types
transformerllm
Threat Tags
inference_timeuntargeteddigital
Datasets
three embedding models and three datasets mentioned but not specifically named in abstract/body excerpt
Applications
vector databasesrag systemssemantic searchllm retrieval