α

Published on arXiv

2508.06789

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

ULIA achieves 100% attack success rate under class-level and client-level unlearning in IID settings, retaining 62.3–93% ASR even when only 1% of a user's local data is unlearned.

ULIA

Novel technique introduced


Federated Unlearning (FU) has emerged as a promising solution to respond to the right to be forgotten of clients, by allowing clients to erase their data from global models without compromising model performance. Unfortunately, researchers find that the parameter variations of models induced by FU expose clients' data information, enabling attackers to infer the label of unlearning data, while label inference attacks against FU remain unexplored. In this paper, we introduce and analyze a new privacy threat against FU and propose a novel label inference attack, ULIA, which can infer unlearning data labels across three FU levels. To address the unique challenges of inferring labels via the models variations, we design a gradient-label mapping mechanism in ULIA that establishes a relationship between gradient variations and unlearning labels, enabling inferring labels on accumulated model variations. We evaluate ULIA on both IID and non-IID settings. Experimental results show that in the IID setting, ULIA achieves a 100% Attack Success Rate (ASR) under both class-level and client-level unlearning. Even when only 1% of a user's local data is forgotten, ULIA still attains an ASR ranging from 93% to 62.3%.


Key Contributions

  • Introduces ULIA, the first label inference attack targeting federated unlearning across all three FU granularities (sample-level, class-level, client-level)
  • Designs a gradient-label mapping mechanism that separates per-label parameter shifts from accumulated unlearning model differences to enable accurate label inference
  • Demonstrates 100% ASR in IID settings under class- and client-level unlearning, and 62.3–93% ASR even when only 1% of a user's local data is forgotten

🛡️ Threat Analysis

Model Inversion Attack

The core attack recovers private data attributes (labels) from model parameter variations — the adversary (server) reconstructs label information from gradient/parameter differences between pre- and post-unlearning model snapshots, directly analogous to gradient leakage attacks in federated learning that reconstruct private training data attributes.


Details

Domains
federated-learning
Model Types
federated
Threat Tags
white_boxtraining_timetargeted
Applications
federated learningfederated unlearning systems