Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials

The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.

Key Contributions

TBFL framework integrating Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) into federated learning to enforce cryptographic participant authentication before aggregation
Proactive Sybil attack neutralization through SSI-based identity verification, achieving 100% Sybil suppression on MIMIC-IV EHR experiments
Demonstration that identity-anchored trust (cryptographic credentials) outperforms behavioral/reputation-based FL security with negligible computational overhead (<0.12%)

🛡️ Threat Analysis

Data Poisoning Attack

The primary threat defended against is FL poisoning via malicious/Sybil participants injecting corrupted gradients to degrade the global model — this is Byzantine attack defense in federated learning, explicitly covered under ML02. The TBFL framework acts as a Byzantine-fault-tolerant protocol by cryptographically gating who can contribute updates.

Details

Domains

federated-learning

Model Types

federatedtraditional_ml

Threat Tags

training_timegrey_box

Datasets

MIMIC-IV

Applications

2026 0 cit.

Data Poisoning Attack

90%