attack 2025

Privacy Auditing Synthetic Data Release through Local Likelihood Attacks

Joshua Ward , Chi-Hua Wang , Guang Cheng

0 citations

α

Published on arXiv

2508.21146

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Gen-LRA achieves average AUC-ROC rank 1.0 across all 9 evaluated generative model architectures (AdsGAN, ARF, Bayesian Network, CTGAN, Normalizing Flows, PATEGAN, Tab-DDPM, TabSyn, TVAE), consistently outperforming six competing MIA baselines.

Gen-LRA (Generative Likelihood Ratio Attack)

Novel technique introduced


Auditing the privacy leakage of synthetic data is an important but unresolved problem. Most existing privacy auditing frameworks for synthetic data rely on heuristics and unreasonable assumptions to attack the failure modes of generative models, exhibiting limited capability to describe and detect the privacy exposure of training data through synthetic data release. In this paper, we study designing Membership Inference Attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution. Here, we propose Generative Likelihood Ratio Attack (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has in a surrogate model's estimation of a local likelihood ratio over the synthetic data. Assessed over a comprehensive benchmark spanning diverse datasets, model architectures, and attack parameters, we find that Gen-LRA consistently dominates other MIAs for generative models across multiple performance metrics. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, highlighting the significant privacy risks posed by generative model overfitting in real-world applications.


Key Contributions

  • Proposes Gen-LRA, a computationally efficient no-box MIA that uses local likelihood ratio estimation over synthetic data to infer training membership without any model access or architecture knowledge
  • Exploits the observation that tabular generative models overfit to certain regions of the training distribution as a systematic privacy leakage signal
  • Comprehensive benchmark across 9 generative model architectures showing Gen-LRA achieves average rank 1.0 in AUC-ROC, outperforming all competing no-box-comparable MIAs

🛡️ Threat Analysis

Membership Inference Attack

Gen-LRA is explicitly a Membership Inference Attack — it determines (binary yes/no) whether a specific tabular data point was in the training set of a generative model. The paper benchmarks it against other MIAs and shows it achieves state-of-the-art performance, making ML04 the sole and direct category.


Details

Domains
tabulargenerative
Model Types
gandiffusiontraditional_ml
Threat Tags
black_boxinference_time
Datasets
diverse tabular datasets (specific names not provided in excerpt)
Applications
synthetic tabular data releaseprivacy auditinghealthcare data sharingfinancial data sharing