Membership Inference over Diffusion-models-based Synthetic Tabular Data

This study investigates the privacy risks associated with diffusion-based synthetic tabular data generation methods, focusing on their susceptibility to Membership Inference Attacks (MIAs). We examine two recent models, TabDDPM and TabSyn, by developing query-based MIAs based on the step-wise error comparison method. Our findings reveal that TabDDPM is more vulnerable to these attacks. TabSyn exhibits resilience against our attack models. Our work underscores the importance of evaluating the privacy implications of diffusion models and encourages further research into robust privacy-preserving mechanisms for synthetic data generation.

Key Contributions

Query-based membership inference attacks using step-wise error comparison against diffusion-based tabular data generators
Comparative privacy vulnerability analysis of TabDDPM vs. TabSyn showing TabDDPM is significantly more susceptible to MIA
Demonstrates that DCR (Distance to Closest Record), the standard privacy metric for synthetic data, is insufficient for capturing MIA-based privacy risks

🛡️ Threat Analysis

Membership Inference Attack

The paper's sole contribution is designing and evaluating MIAs — specifically query-based step-wise error comparison attacks — to determine whether specific records were in the training sets of TabDDPM and TabSyn diffusion models.