tool 2025

PrivacyGuard: A Modular Framework for Privacy Auditing in Machine Learning

Luca Melis , Matthew Grange , Iden Kalemaj , Karan Chadha , Shengyuan Hu , Elena Kashtelyan , Will Bullock

0 citations · 24 references · arXiv

α

Published on arXiv

2510.23427

Membership Inference Attack

OWASP ML Top 10 — ML04

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

PrivacyGuard provides a unified, extensible framework integrating LiRA, RMIA, and reconstruction attacks with configurable empirical DP metrics, bridging theoretical privacy research and practical model auditing.

PrivacyGuard

Novel technique introduced


The increasing deployment of Machine Learning (ML) models in sensitive domains motivates the need for robust, practical privacy assessment tools. PrivacyGuard is a comprehensive tool for empirical differential privacy (DP) analysis, designed to evaluate privacy risks in ML models through state-of-the-art inference attacks and advanced privacy measurement techniques. To this end, PrivacyGuard implements a diverse suite of privacy attack -- including membership inference , extraction, and reconstruction attacks -- enabling both off-the-shelf and highly configurable privacy analyses. Its modular architecture allows for the seamless integration of new attacks, and privacy metrics, supporting rapid adaptation to emerging research advances. We make PrivacyGuard available at https://github.com/facebookresearch/PrivacyGuard.


Key Contributions

  • Modular PyTorch framework implementing state-of-the-art privacy attacks (LiRA, RMIA, calibration-based MIA, reconstruction attacks) as configurable, extensible modules
  • Empirical DP analysis pipeline computing privacy metrics (AUC, empirical epsilon, extraction rates) from attack outcomes rather than theoretical worst-case bounds
  • Support for both traditional supervised learning models and generative AI/LLMs, with tutorials on CIFAR-10 and Enron datasets

🛡️ Threat Analysis

Model Inversion Attack

The framework also implements reconstruction attacks that evaluate an adversary's ability to extract PII from generative models given access to masked training data, directly targeting training data recovery.

Membership Inference Attack

Membership inference attacks are the primary implemented attack type, including LiRA, RMIA, and calibration-based MIAs. The framework computes MIA-derived metrics (AUC, empirical epsilon) as its core privacy assessment output.


Details

Domains
visionnlpgenerative
Model Types
llmtransformertraditional_ml
Threat Tags
black_boxgrey_boxtraining_timeinference_time
Datasets
CIFAR-10Enron
Applications
ml model privacy auditingempirical differential privacy analysisgenerative ai privacy assessment