Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

Vector databases serve as the retrieval backbone of modern AI applications, yet their security remains largely unexplored. We propose the Black-Hole Attack, a poisoning attack that injects a small number of malicious vectors near the geometric center of the stored vectors. These injected vectors attract queries like a black hole and frequently appear in the top-k retrieval results for most queries. This attack is enabled by a phenomenon we term centrality-driven hubness: in high-dimensional embedding spaces, vectors near the centroid become nearest neighbors of a disproportionately large number of other vectors, while this centroid region is nearly empty in practice. The attack shows that vectors in a vector database cannot be blindly trusted: geometric defects in high-dimensional embeddings make retrieval inherently vulnerable. Our experiments show that malicious vectors appear in up to 99.85% of top-10 results. Additionally, we evaluate existing hubness mitigation methods as potential defenses against the Black-Hole Attack. The results show that these methods either significantly reduce retrieval accuracy or provide limited protection, which indicates the need for more robust defenses against the Black-Hole Attack.

Key Contributions

Discovers centrality-driven hubness phenomenon in high-dimensional embedding spaces where vectors near the centroid become nearest neighbors of disproportionately many queries
Proposes Black-Hole Attack that injects only 1% malicious vectors to contaminate 50-90% of top-10 retrieval results
Evaluates existing hubness mitigation methods as defenses and finds them inadequate

🛡️ Threat Analysis

Data Poisoning Attack

The Black-Hole Attack is a data poisoning attack that injects malicious vectors into the vector database at inference time, corrupting retrieval results by exploiting geometric defects in high-dimensional embedding spaces. The attack vector is the injected data itself.