Personal Attribute Leakage in Federated Speech Models

Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.

Key Contributions

Demonstrates feasibility of non-parametric attribute inference attacks against federated ASR models (Wav2Vec2, HuBERT, Whisper) using only weight differentials, with no access to raw speech
Shadow model approach: fine-tunes global model on labeled public speech to build class centroids from weight statistics, then predicts attributes via normalized Euclidean distance
Finds that attributes underrepresented or absent in pre-training data (notably accent) are most vulnerable to leakage across all tested models

🛡️ Threat Analysis

Model Inversion Attack

The attack reconstructs private personal attributes (gender, age, accent, emotion, dysarthria) from model weight updates in a federated learning setting — a gradient/weight leakage attack where a passive server-side adversary infers private training data properties from shared model updates without accessing raw audio.

Details

Domains

audiofederated-learning

Model Types

transformerfederated

Threat Tags

white_boxtraining_time

Datasets

Speech Accent ArchiveTORGORAVDESS

Applications

2025 0 cit.

Model Inversion Attack

62%

Personal Attribute Leakage in Federated Speech Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning

Label Inference Attacks against Federated Unlearning

LEA: Label Enumeration Attack in Vertical Federated Learning

Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

MAUI: Reconstructing Private Client Data in Federated Transfer Learning

Formalisation of Security for Federated Learning with DP and Attacker Advantage in IIIf for Satellite Swarms -- Extended Version

SelectiveShield: Lightweight Hybrid Defense Against Gradient Leakage in Federated Learning