attack 2026

Semantic Leakage from Image Embeddings

Yiyi Chen 1, Qiongkai Xu 2, Desmond Elliott 3, Qiongxiu Li 1, Johannes Bjerva 1

0 citations · 44 references · arXiv

α

Published on arXiv

2601.22929

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Semantic tags, symbolic representations, and coherent natural language descriptions are consistently recoverable from compressed image embeddings produced by GEMINI, COHERE, NOMIC, and CLIP using only black-box access to the embeddings.

SLImE

Novel technique introduced


Image embeddings are generally assumed to pose limited privacy risk. We challenge this assumption by formalizing semantic leakage as the ability to recover semantic structures from compressed image embeddings. Surprisingly, we show that semantic leakage does not require exact reconstruction of the original image. Preserving local semantic neighborhoods under embedding alignment is sufficient to expose the intrinsic vulnerability of image embeddings. Crucially, this preserved neighborhood structure allows semantic information to propagate through a sequence of lossy mappings. Based on this conjecture, we propose Semantic Leakage from Image Embeddings (SLImE), a lightweight inference framework that reveals semantic information from standalone compressed image embeddings, incorporating a locally trained semantic retriever with off-the-shelf models, without training task-specific decoders. We thoroughly validate each step of the framework empirically, from aligned embeddings to retrieved tags, symbolic representations, and grammatical and coherent descriptions. We evaluate SLImE across a range of open and closed embedding models, including GEMINI, COHERE, NOMIC, and CLIP, and demonstrate consistent recovery of semantic information across diverse inference tasks. Our results reveal a fundamental vulnerability in image embeddings, whereby the preservation of semantic neighborhoods under alignment enables semantic leakage, highlighting challenges for privacy preservation.1


Key Contributions

  • Formalizes semantic leakage from image embeddings, showing exact reconstruction is unnecessary — preserved local semantic neighborhoods are sufficient to enable privacy attacks
  • Proposes SLImE, a lightweight framework using embedding alignment and a locally-trained semantic retriever to extract tags, symbolic representations, and natural language descriptions from standalone compressed embeddings
  • Demonstrates consistent semantic recovery across diverse open and closed embedding models (GEMINI, COHERE, NOMIC, CLIP) without access to model weights or task-specific decoders

🛡️ Threat Analysis

Model Inversion Attack

SLImE is an embedding inversion attack — an adversary with no model internals receives image embeddings and recovers semantic content (tags, symbolic representations, coherent descriptions) from them. This directly matches the ML03 definition of 'embedding inversion (recovering text/data from embedding vectors)' under a realistic black-box threat model.


Details

Domains
visionmultimodal
Model Types
vlmmultimodaltransformer
Threat Tags
black_boxinference_time
Applications
image retrieval systemsembedding apismultimodal ai systems