Membership Inference over Diffusion-models-based Synthetic Tabular Data
Published on arXiv
2510.16037
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
TabDDPM is substantially more vulnerable to query-based membership inference attacks than TabSyn, which exhibits notable resilience against the proposed step-wise error comparison attack.
Step-wise Error Comparison MIA
Novel technique introduced
This study investigates the privacy risks associated with diffusion-based synthetic tabular data generation methods, focusing on their susceptibility to Membership Inference Attacks (MIAs). We examine two recent models, TabDDPM and TabSyn, by developing query-based MIAs based on the step-wise error comparison method. Our findings reveal that TabDDPM is more vulnerable to these attacks. TabSyn exhibits resilience against our attack models. Our work underscores the importance of evaluating the privacy implications of diffusion models and encourages further research into robust privacy-preserving mechanisms for synthetic data generation.
Key Contributions
- Query-based membership inference attacks using step-wise error comparison against diffusion-based tabular data generators
- Comparative privacy vulnerability analysis of TabDDPM vs. TabSyn showing TabDDPM is significantly more susceptible to MIA
- Demonstrates that DCR (Distance to Closest Record), the standard privacy metric for synthetic data, is insufficient for capturing MIA-based privacy risks
🛡️ Threat Analysis
The paper's sole contribution is designing and evaluating MIAs — specifically query-based step-wise error comparison attacks — to determine whether specific records were in the training sets of TabDDPM and TabSyn diffusion models.