PrivacyGuard: A Modular Framework for Privacy Auditing in Machine Learning
Luca Melis , Matthew Grange , Iden Kalemaj , Karan Chadha , Shengyuan Hu , Elena Kashtelyan , Will Bullock
Published on arXiv
2510.23427
Membership Inference Attack
OWASP ML Top 10 — ML04
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
PrivacyGuard provides a unified, extensible framework integrating LiRA, RMIA, and reconstruction attacks with configurable empirical DP metrics, bridging theoretical privacy research and practical model auditing.
PrivacyGuard
Novel technique introduced
The increasing deployment of Machine Learning (ML) models in sensitive domains motivates the need for robust, practical privacy assessment tools. PrivacyGuard is a comprehensive tool for empirical differential privacy (DP) analysis, designed to evaluate privacy risks in ML models through state-of-the-art inference attacks and advanced privacy measurement techniques. To this end, PrivacyGuard implements a diverse suite of privacy attack -- including membership inference , extraction, and reconstruction attacks -- enabling both off-the-shelf and highly configurable privacy analyses. Its modular architecture allows for the seamless integration of new attacks, and privacy metrics, supporting rapid adaptation to emerging research advances. We make PrivacyGuard available at https://github.com/facebookresearch/PrivacyGuard.
Key Contributions
- Modular PyTorch framework implementing state-of-the-art privacy attacks (LiRA, RMIA, calibration-based MIA, reconstruction attacks) as configurable, extensible modules
- Empirical DP analysis pipeline computing privacy metrics (AUC, empirical epsilon, extraction rates) from attack outcomes rather than theoretical worst-case bounds
- Support for both traditional supervised learning models and generative AI/LLMs, with tutorials on CIFAR-10 and Enron datasets
🛡️ Threat Analysis
The framework also implements reconstruction attacks that evaluate an adversary's ability to extract PII from generative models given access to masked training data, directly targeting training data recovery.
Membership inference attacks are the primary implemented attack type, including LiRA, RMIA, and calibration-based MIAs. The framework computes MIA-derived metrics (AUC, empirical epsilon) as its core privacy assessment output.