defense 2026

Decoupling Generalizability and Membership Privacy Risks in Neural Networks

Xingli Fang , Jung-Eun Kim

0 citations · 59 references · arXiv (Cornell University)

α

Published on arXiv

2602.02296

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

PPTP maintains model generalizability significantly better than prior defenses while enhancing protection against membership inference attacks.

PPTP (Privacy-Preserving Training Principle)

Novel technique introduced


A deep learning model usually has to sacrifice some utilities when it acquires some other abilities or characteristics. Privacy preservation has such trade-off relationships with utilities. The loss disparity between various defense approaches implies the potential to decouple generalizability and privacy risks to maximize privacy gain. In this paper, we identify that the model's generalization and privacy risks exist in different regions in deep neural network architectures. Based on the observations that we investigate, we propose Privacy-Preserving Training Principle (PPTP) to protect model components from privacy risks while minimizing the loss in generalizability. Through extensive evaluations, our approach shows significantly better maintenance in model generalizability while enhancing privacy preservation.


Key Contributions

  • Identifies that generalization and membership privacy risks are localized in different architectural regions of deep neural networks
  • Proposes Privacy-Preserving Training Principle (PPTP) that selectively protects high-risk model components to decouple privacy and utility
  • Demonstrates significantly better utility-privacy tradeoff compared to existing membership inference defenses

🛡️ Threat Analysis

Membership Inference Attack

Paper explicitly targets 'membership privacy risks' — the threat that an adversary determines whether a specific sample was in the training set. PPTP is a training-time defense against membership inference attacks that decouples this risk from generalizability.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_time
Applications
image classificationneural network training