Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao , Wei Qian , Aobo Chen , Mengdi Huai
Published on arXiv
2508.07066
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Provides statistically rigorous FDR control for membership inference attacks in black-box settings while functioning as a plug-in wrapper for existing MIA methods.
Recent studies have shown that deep learning models are vulnerable to membership inference attacks (MIAs), which aim to infer whether a data record was used to train a target model or not. To analyze and study these vulnerabilities, various MIA methods have been proposed. Despite the significance and popularity of MIAs, existing works on MIAs are limited in providing guarantees on the false discovery rate (FDR), which refers to the expected proportion of false discoveries among the identified positive discoveries. However, it is very challenging to ensure the false discovery rate guarantees, because the underlying distribution is usually unknown, and the estimated non-member probabilities often exhibit interdependence. To tackle the above challenges, in this paper, we design a novel membership inference attack method, which can provide the guarantees on the false discovery rate. Additionally, we show that our method can also provide the marginal probability guarantee on labeling true non-member data as member data. Notably, our method can work as a wrapper that can be seamlessly integrated with existing MIA methods in a post-hoc manner, while also providing the FDR control. We perform the theoretical analysis for our method. Extensive experiments in various settings (e.g., the black-box setting and the lifelong learning setting) are also conducted to verify the desirable performance of our method.
Key Contributions
- Novel MIA framework that provides provable false discovery rate (FDR) and marginal probability guarantees on incorrectly labeling non-members as members
- Post-hoc wrapper design that integrates seamlessly with existing MIA methods without retraining or modifying them
- Theoretical analysis of FDR guarantees and empirical validation across black-box and lifelong learning settings
🛡️ Threat Analysis
The paper's sole focus is membership inference attacks — determining whether specific data records were in a model's training set — with a novel contribution of FDR control guarantees on top of existing MIA methods.