Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Vertical federated learning (VFL) allows an active party with a top model, and multiple passive parties with bottom models to collaborate. In this scenario, passive parties possessing only features may attempt to infer active party's private labels, making label inference attacks (LIAs) a significant threat. Previous LIA studies have claimed that well-trained bottom models can effectively represent labels. However, we demonstrate that this view is misleading and exposes the vulnerability of existing LIAs. By leveraging mutual information, we present the first observation of the "model compensation" phenomenon in VFL. We theoretically prove that, in VFL, the mutual information between layer outputs and labels increases with layer depth, indicating that bottom models primarily extract feature information while the top model handles label mapping. Building on this insight, we introduce task reassignment to show that the success of existing LIAs actually stems from the distribution alignment between features and labels. When this alignment is disrupted, the performance of LIAs declines sharply or even fails entirely. Furthermore, the implications of this insight for defenses are also investigated. We propose a zero-overhead defense technique based on layer adjustment. Extensive experiments across five datasets and five representative model architectures indicate that shifting cut layers forward to increase the proportion of top model layers in the entire model not only improves resistance to LIAs but also enhances other defenses.

Key Contributions

First observation of 'model compensation' phenomenon in VFL using mutual information analysis showing bottom models focus on feature extraction while top models handle label mapping
Demonstrates existing label inference attacks succeed due to feature-label distribution alignment rather than bottom model label representation capability
Proposes zero-overhead defense via cut layer adjustment that increases top model proportion, reducing LIA accuracy to random guessing

🛡️ Threat Analysis

Model Inversion Attack

Label inference attacks aim to extract private label information from the active party by exploiting embeddings from bottom models - this is a privacy attack where the adversary (passive party) attempts to infer private training data (labels) from model outputs.

Details

Domains

federated-learning

Model Types

federatedcnntraditional_ml

Threat Tags

training_timegrey_box

Datasets

Three benchmark datasets and two real-world datasets (specific names not provided in abstract)

Applications

2025 0 cit.

Model Inversion Attack

73%