defense 2026

Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks

Yu-Che Tsai 1, Hsiang Hsiao 1, Kuan-Yu Chen 1, Shou-De Lin 1,2

0 citations · 40 references · arXiv (Cornell University)

α

Published on arXiv

2602.07090

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

SPARSE consistently reduces privacy leakage from embedding inversion attacks while achieving superior downstream task performance compared to state-of-the-art DP methods across six datasets and three embedding models.

SPARSE

Novel technique introduced


Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility. We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings. SPARSE combines (1) differentiable mask learning to identify privacy-sensitive dimensions for user-defined concepts, and (2) the Mahalanobis mechanism that applies elliptical noise calibrated by dimension sensitivity. Unlike traditional spherical noise injection, SPARSE selectively perturbs privacy-sensitive dimensions while preserving non-sensitive semantics. Evaluated across six datasets with three embedding models and attack scenarios, SPARSE consistently reduces privacy leakage while achieving superior downstream performance compared to state-of-the-art DP methods.


Key Contributions

  • Differentiable mask learning that identifies and isolates privacy-sensitive embedding dimensions for user-defined concepts
  • Mahalanobis noise mechanism applying elliptical (dimension-calibrated) DP noise to selectively perturb sensitive dimensions while preserving non-sensitive semantics
  • Empirical evaluation across six datasets and three embedding models showing SPARSE achieves better privacy-utility tradeoff than uniform spherical DP baselines

🛡️ Threat Analysis

Model Inversion Attack

The paper explicitly defends against embedding inversion attacks, where an adversary reconstructs raw text or extracts sensitive attributes from embedding vectors. ML03 directly covers 'embedding inversion (recovering text/data from embedding vectors)' as a model inversion attack, and SPARSE is evaluated against concrete reconstruction attack scenarios.


Details

Domains
nlp
Model Types
transformer
Threat Tags
inference_timewhite_box
Datasets
six unspecified datasets (per abstract)
Applications
text embedding servicesnlp downstream tasksprivacy-sensitive text retrieval