PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregation quality. We introduce PTOPOFL, a framework that addresses both challenges simultaneously by replacing gradient communication with topological descriptors derived from persistent homology (PH). Clients transmit only 48-dimensional PH feature vectors-compact shape summaries whose many-to-one structure makes inversion provably ill-posed-rather than model gradients. The server performs topology-guided personalised aggregation: clients are clustered by Wasserstein similarity between their PH diagrams, intra-cluster models are topology-weighted,and clusters are blended with a global consensus. We prove an information-contraction theorem showing that PH descriptors leak strictly less mutual information per sample than gradients under strongly convex loss functions, and we establish linear convergence of the Wasserstein-weighted aggregation scheme with an error floor strictly smaller than FedAvg. Evaluated against FedAvg, FedProx, SCAFFOLD, and pFedMe on a non-IID healthcare scenario (8 hospitals, 2 adversarial) and a pathological benchmark (10 clients), PTOPOFL achieves AUC 0.841 and 0.910 respectively-the highest in both settings-while reducing reconstruction risk by a factor of 4.5 relative to gradient sharing. Code is publicly available at https://github.com/MorillaLab/TopoFederatedL and data at https://doi.org/10.5281/zenodo.18827595.

Key Contributions

Replaces gradient communication with 48-dimensional persistent homology (PH) feature vectors, provably reducing reconstruction risk by 4.5x via an information-contraction theorem
Wasserstein-weighted personalised aggregation clusters clients by topological similarity, achieving higher AUC than FedAvg, FedProx, SCAFFOLD, and pFedMe on non-IID benchmarks
Topology-based anomaly detection exponentially suppresses adversarial client influence, with formal proof of linear convergence to an error floor below FedAvg

🛡️ Threat Analysis

Data Poisoning Attack

Explicitly models 2 adversarial clients in the healthcare FL scenario; proves adversarial client influence decays exponentially in topological separation from honest majority (vs. linear scaling in FedAvg); topology-based anomaly detection flags and down-weights poisoning sources.

Model Inversion Attack

Primary security contribution is defending against gradient inversion attacks (citing Zhu et al., Geiping et al.) by transmitting 48-dim PH feature vectors whose many-to-one structure makes reconstruction provably ill-posed; formally proves a 4.5x reduction in per-sample mutual information leakage relative to gradient sharing.