When Privacy Isn't Synthetic: Hidden Data Leakage in Generative AI Models

Generative models are increasingly used to produce privacy-preserving synthetic data as a safe alternative to sharing sensitive training datasets. However, we demonstrate that such synthetic releases can still leak information about the underlying training samples through structural overlap in the data manifold. We propose a black-box membership inference attack that exploits this vulnerability without requiring access to model internals or real data. The attacker repeatedly queries the generative model to obtain large numbers of synthetic samples, performs unsupervised clustering to identify dense regions of the synthetic distribution, and then analyzes cluster medoids and neighborhoods that correspond to high-density regions in the original training data. These neighborhoods act as proxies for training samples, enabling the adversary to infer membership or reconstruct approximate records. Our experiments across healthcare, finance, and other sensitive domains show that cluster overlap between real and synthetic data leads to measurable membership leakage-even when the generator is trained with differential privacy or other noise mechanisms. The results highlight an under-explored attack surface in synthetic data generation pipelines and call for stronger privacy guarantees that account for distributional neighborhood inference rather than sample-level memorization alone, underscoring its role in privacy-preserving data publishing. Implementation and evaluation code are publicly available at:github.com/Cluster-Medoid-Leakage-Attack.

Key Contributions

A black-box membership inference attack (Cluster-Medoid Leakage Attack) that requires no model internals or real data — only repeated queries to the generative model
Demonstrates that unsupervised clustering of synthetic outputs reveals high-density regions that overlap with real training data, exposing membership and enabling approximate record reconstruction
Shows that distributional neighborhood leakage persists even when generators are trained with differential privacy, across GANs, VAEs, diffusion models, and LLMs in healthcare and finance domains

🛡️ Threat Analysis

Model Inversion Attack

Beyond binary membership inference, the attack enables approximate reconstruction of training records via cluster medoids acting as proxies — cluster neighborhoods serve as reconstructions of real training samples, satisfying the ML03 adversarial data reconstruction criterion.

Membership Inference Attack

The paper's primary contribution is explicitly a black-box membership inference attack targeting generative models that produce synthetic data — the adversary determines whether specific records were in the training set by clustering synthetic samples and analyzing their dense neighborhoods.