Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations
Guang Yang 1, Lixia Luo 2, Qiongxiu Li 3
Published on arXiv
2511.07073
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
TAR achieves perfect (exact) reconstruction of private training inputs from disclosed intermediate centroids, demonstrating high success rates across diverse datasets and refuting the prevailing view that centroid release is privacy-safe.
TAR (Trajectory-Aware Reconstruction)
Novel technique introduced
Federated clustering allows multiple parties to discover patterns in distributed data without sharing raw samples. To reduce overhead, many protocols disclose intermediate centroids during training. While often treated as harmless for efficiency, whether such disclosure compromises privacy remains an open question. Prior analyses modeled the problem as a so-called Hidden Subset Sum Problem (HSSP) and argued that centroid release may be safe, since classical HSSP attacks fail to recover inputs. We revisit this question and uncover a new leakage mechanism: temporal regularities in $k$-means iterations create exploitable structure that enables perfect input reconstruction. Building on this insight, we propose Trajectory-Aware Reconstruction (TAR), an attack that combines temporal assignment information with algebraic analysis to recover exact original inputs. Our findings provide the first rigorous evidence, supported by a practical attack, that centroid disclosure in federated clustering significantly compromises privacy, exposing a fundamental tension between privacy and efficiency.
Key Contributions
- First rigorous theoretical and empirical evidence that releasing intermediate centroids in federated k-means is fundamentally insecure, contradicting prior HSSP-based analyses that deemed centroid disclosure safe.
- TAR (Trajectory-Aware Reconstruction) attack that exploits temporal regularities in k-means iterations — specifically stable cluster membership and identical switching trajectories — to induce recoverable algebraic structure in the assignment matrix.
- RREF-based recoverability test that provides both theoretical guarantees and a practical check for when perfect input reconstruction is achievable.
🛡️ Threat Analysis
TAR is a data reconstruction attack in federated learning — an adversary observes shared intermediate centroids and recovers exact original training inputs. This is the federated analogue of gradient inversion: instead of gradients, the leakage channel is disclosed cluster centroids, but the adversarial goal (reconstructing participants' private data) and mechanism (algebraic analysis of shared model statistics) are precisely ML03.