Semantic Leakage from Image Embeddings

Image embeddings are generally assumed to pose limited privacy risk. We challenge this assumption by formalizing semantic leakage as the ability to recover semantic structures from compressed image embeddings. Surprisingly, we show that semantic leakage does not require exact reconstruction of the original image. Preserving local semantic neighborhoods under embedding alignment is sufficient to expose the intrinsic vulnerability of image embeddings. Crucially, this preserved neighborhood structure allows semantic information to propagate through a sequence of lossy mappings. Based on this conjecture, we propose Semantic Leakage from Image Embeddings (SLImE), a lightweight inference framework that reveals semantic information from standalone compressed image embeddings, incorporating a locally trained semantic retriever with off-the-shelf models, without training task-specific decoders. We thoroughly validate each step of the framework empirically, from aligned embeddings to retrieved tags, symbolic representations, and grammatical and coherent descriptions. We evaluate SLImE across a range of open and closed embedding models, including GEMINI, COHERE, NOMIC, and CLIP, and demonstrate consistent recovery of semantic information across diverse inference tasks. Our results reveal a fundamental vulnerability in image embeddings, whereby the preservation of semantic neighborhoods under alignment enables semantic leakage, highlighting challenges for privacy preservation.1

Key Contributions

Formalizes semantic leakage from image embeddings, showing exact reconstruction is unnecessary — preserved local semantic neighborhoods are sufficient to enable privacy attacks
Proposes SLImE, a lightweight framework using embedding alignment and a locally-trained semantic retriever to extract tags, symbolic representations, and natural language descriptions from standalone compressed embeddings
Demonstrates consistent semantic recovery across diverse open and closed embedding models (GEMINI, COHERE, NOMIC, CLIP) without access to model weights or task-specific decoders

🛡️ Threat Analysis

Model Inversion Attack

SLImE is an embedding inversion attack — an adversary with no model internals receives image embeddings and recovers semantic content (tags, symbolic representations, coherent descriptions) from them. This directly matches the ML03 definition of 'embedding inversion (recovering text/data from embedding vectors)' under a realistic black-box threat model.