defense 2026

Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence

Amir Asiaee ¹, Chao Yan ¹, Zachary B. Abrams ², Bradley A. Malin ¹

¹ Vanderbilt University Medical Center

² Washington University

0 citations · 29 references · arXiv (Cornell University)

Published on arXiv

2602.10232

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Risk-equalized DP synthesis substantially reduces MIA success against high-outlierness records compared to standard DP synthesis, with ablations confirming targeted weighting (not random downweighting) drives the improvement.

REPS (Risk-Equalized Private Synthesis)

Novel technique introduced

When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's "outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so deliberately shrinking outliers' contributions yields tighter per-instance privacy bounds for precisely those records that need them most. We prove end-to-end DP guarantees via composition and derive closed-form per-record bounds for the synthesis stage (the scoring stage adds a uniform per-record term). Experiments on simulated data with controlled outlier injection show that risk-weighting substantially reduces membership inference success against high-outlierness records; ablations confirm that targeting -- not random downweighting -- drives the improvement. On real-world benchmarks (Breast Cancer, Adult, German Credit), gains are dataset-dependent, highlighting the interplay between scorer quality and synthesis pipeline.

Key Contributions

REPS (Risk-Equalized Private Synthesis): a two-stage framework that first scores each record's outlierness with a small DP budget, then trains a DP synthesizer weighting records inversely to their risk score to reduce per-instance privacy loss for outliers
Closed-form per-instance DP bounds showing that a record's privacy loss is proportional to its influence on the generator output, enabling a constructive weight schedule to cap ε_i for designated high-risk records
Empirical demonstration that risk-targeted downweighting (not random downweighting) substantially reduces membership inference success against outlier records on both simulated and real-world tabular benchmarks

🛡️ Threat Analysis

Membership Inference Attack

The paper's primary security contribution is defending against membership inference attacks — specifically reducing MIA success rates against high-outlierness (outlier) records by controlling their per-instance influence on a DP synthesizer. MIA success stratified by outlierness is the central empirical evaluation metric throughout.

Details

Domains

tabular

Model Types

traditional_ml

Threat Tags

training_timeblack_box

Datasets

Breast Cancer WisconsinAdultGerman Credit

Applications

synthetic data releasehealthcare data privacytabular data generation

Read PDF arXiv DOI

Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Private and interpretable clinical prediction with quantum-inspired tensor train models

Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense

Towards Privacy-Aware Bayesian Networks: A Credal Approach

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

The Sample Complexity of Membership Inference and Privacy Auditing

Exponential-Family Membership Inference: From LiRA and RMIA to BaVarIA

Delete and Retain: Efficient Unlearning for Document Classification

Reference Recommendation based Membership Inference Attack against Hybrid-based Recommender Systems