Personal Attribute Leakage in Federated Speech Models
Hamdan Al-Ali 1, Ali Reza Ghavamipour 1,2, Tommaso Caselli 3, Fatih Turkmen 3, Zeerak Talat 4, Hanan Aldarmaki 1
Published on arXiv
2510.13357
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Accent information can be reliably inferred from all three federated ASR models; attributes underrepresented in pre-training data exhibit the highest leakage vulnerability.
Centroid-based attribute inference via weight statistics
Novel technique introduced
Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.
Key Contributions
- Demonstrates feasibility of non-parametric attribute inference attacks against federated ASR models (Wav2Vec2, HuBERT, Whisper) using only weight differentials, with no access to raw speech
- Shadow model approach: fine-tunes global model on labeled public speech to build class centroids from weight statistics, then predicts attributes via normalized Euclidean distance
- Finds that attributes underrepresented or absent in pre-training data (notably accent) are most vulnerable to leakage across all tested models
🛡️ Threat Analysis
The attack reconstructs private personal attributes (gender, age, accent, emotion, dysarthria) from model weight updates in a federated learning setting — a gradient/weight leakage attack where a passive server-side adversary infers private training data properties from shared model updates without accessing raw audio.