benchmark 2025

Observational Auditing of Label Privacy

Iden Kalemaj , Luca Melis , Maxime Boucher , Ilya Mironov , Saeed Mahloujifar

0 citations · 51 references · arXiv

α

Published on arXiv

2511.14084

Model Inversion Attack

OWASP ML Top 10 — ML03

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Demonstrates effective empirical auditing of label privacy guarantees on Criteo and CIFAR-10 without any modifications to the training data pipeline, enabling practical deployment in large-scale production environments.

Observational Auditing Framework

Novel technique introduced


Differential privacy (DP) auditing is essential for evaluating privacy guarantees in machine learning systems. Existing auditing methods, however, pose a significant challenge for large-scale systems since they require modifying the training dataset -- for instance, by injecting out-of-distribution canaries or removing samples from training. Such interventions on the training data pipeline are resource-intensive and involve considerable engineering overhead. We introduce a novel observational auditing framework that leverages the inherent randomness of data distributions, enabling privacy evaluation without altering the original dataset. Our approach extends privacy auditing beyond traditional membership inference to protected attributes, with labels as a special case, addressing a key gap in existing techniques. We provide theoretical foundations for our method and perform experiments on Criteo and CIFAR-10 datasets that demonstrate its effectiveness in auditing label privacy guarantees. This work opens new avenues for practical privacy auditing in large-scale production environments.


Key Contributions

  • Observational auditing framework that evaluates label/attribute privacy guarantees without modifying the training dataset or injecting canaries
  • Theoretical extension of DP auditing beyond membership inference to protected attributes (labels as a special case)
  • Empirical validation on Criteo and CIFAR-10 demonstrating practical effectiveness in large-scale settings

🛡️ Threat Analysis

Model Inversion Attack

Label/attribute inference — recovering a training sample's label or protected attribute from a model — is the core novel threat being audited. This is a form of attribute inference (model inversion), where the adversary reconstructs private attributes (labels) from trained model behavior.

Membership Inference Attack

The framework explicitly extends traditional membership inference auditing; evaluating whether DP guarantees hold against membership inference adversaries is a central component of the methodology.


Details

Domains
visiontabular
Model Types
cnntraditional_ml
Threat Tags
training_timeblack_box
Datasets
CriteoCIFAR-10
Applications
differential privacy auditinglabel privacyattribute inference