defense 2026

Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials

Rodrigo Tertulino , Ricardo Almeida , Laercio Alencar

0 citations · 61 references · arXiv (Cornell University)

α

Published on arXiv

2602.02629

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Framework neutralizes 100% of Sybil attacks while maintaining AUC=0.954 and introducing less than 0.12% computational overhead across federated EHR training.

TBFL (Trustworthy Blockchain-based Federated Learning)

Novel technique introduced


The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.


Key Contributions

  • TBFL framework integrating Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) into federated learning to enforce cryptographic participant authentication before aggregation
  • Proactive Sybil attack neutralization through SSI-based identity verification, achieving 100% Sybil suppression on MIMIC-IV EHR experiments
  • Demonstration that identity-anchored trust (cryptographic credentials) outperforms behavioral/reputation-based FL security with negligible computational overhead (<0.12%)

🛡️ Threat Analysis

Data Poisoning Attack

The primary threat defended against is FL poisoning via malicious/Sybil participants injecting corrupted gradients to degrade the global model — this is Byzantine attack defense in federated learning, explicitly covered under ML02. The TBFL framework acts as a Byzantine-fault-tolerant protocol by cryptographically gating who can contribute updates.


Details

Domains
federated-learning
Model Types
federatedtraditional_ml
Threat Tags
training_timegrey_box
Datasets
MIMIC-IV
Applications
electronic health recordsfederated learning for healthcare