The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers
Owais Makroo 1,2, Siva Rajesh Kasa 2, Sumegh Roychowdhury 2, Karan Gupta 2, Nikhil Pattisapu 2, Santhosh Kasa 2, Sumit Negi 2
Published on arXiv
2510.16122
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Fully generative classifiers that explicitly model the joint likelihood P(X,Y) are consistently the most vulnerable to membership inference attacks across all evaluated datasets and MIA strategies.
Membership Inference Attacks (MIAs) pose a critical privacy threat by enabling adversaries to determine whether a specific sample was included in a model's training dataset. Despite extensive research on MIAs, systematic comparisons between generative and discriminative classifiers remain limited. This work addresses this gap by first providing theoretical motivation for why generative classifiers exhibit heightened susceptibility to MIAs, then validating these insights through comprehensive empirical evaluation. Our study encompasses discriminative, generative, and pseudo-generative text classifiers across varying training data volumes, evaluated on nine benchmark datasets. Employing a diverse array of MIA strategies, we consistently demonstrate that fully generative classifiers which explicitly model the joint likelihood $P(X,Y)$ are most vulnerable to membership leakage. Furthermore, we observe that the canonical inference approach commonly used in generative classifiers significantly amplifies this privacy risk. These findings reveal a fundamental utility-privacy trade-off inherent in classifier design, underscoring the critical need for caution when deploying generative classifiers in privacy-sensitive applications. Our results motivate future research directions in developing privacy-preserving generative classifiers that can maintain utility while mitigating membership inference vulnerabilities.
Key Contributions
- Theoretical framework using total variation distance to prove generative classifiers (modeling joint P(X,Y)) have a higher MIA upper bound than discriminative classifiers
- Comprehensive empirical evaluation across discriminative, generative, and pseudo-generative text classifiers on nine benchmark datasets using diverse MIA strategies
- Identifies that the canonical inference approach in generative classifiers further amplifies membership leakage, exposing a fundamental utility-privacy trade-off in classifier design
🛡️ Threat Analysis
The paper's entire focus is on Membership Inference Attacks — theoretically proving and empirically validating that fully generative classifiers modeling P(X,Y) are significantly more susceptible to MIAs than discriminative classifiers across nine NLP benchmarks.