Proof-of-Data: A Consensus Protocol for Collaborative Intelligence

Existing research on federated learning has been focused on the setting where learning is coordinated by a centralized entity. Yet the greatest potential of future collaborative intelligence would be unleashed in a more open and democratized setting with no central entity in a dominant role, referred to as "decentralized federated learning". New challenges arise accordingly in achieving both correct model training and fair reward allocation with collective effort among all participating nodes, especially with the threat of the Byzantine node jeopardising both tasks. In this paper, we propose a blockchain-based decentralized Byzantine fault-tolerant federated learning framework based on a novel Proof-of-Data (PoD) consensus protocol to resolve both the "trust" and "incentive" components. By decoupling model training and contribution accounting, PoD is able to enjoy not only the benefit of learning efficiency and system liveliness from asynchronous societal-scale PoW-style learning but also the finality of consensus and reward allocation from epoch-based BFT-style voting. To mitigate false reward claims by data forgery from Byzantine attacks, a privacy-aware data verification and contribution-based reward allocation mechanism is designed to complete the framework. Our evaluation results show that PoD demonstrates performance in model training close to that of the centralized counterpart while achieving trust in consensus and fairness for reward allocation with a fault tolerance ratio of 1/3.

Key Contributions

Proof-of-Data (PoD) consensus protocol that decouples model training from contribution accounting, combining asynchronous PoW-style learning with epoch-based BFT voting for finality
Privacy-aware data verification mechanism to detect and mitigate false reward claims from Byzantine data forgery attacks
Byzantine-fault-tolerant decentralized FL framework achieving 1/3 fault tolerance ratio with model accuracy comparable to centralized FL

🛡️ Threat Analysis

Data Poisoning Attack

The paper's security contribution is defending against Byzantine nodes in decentralized federated learning — malicious participants who submit forged data or corrupt model updates to degrade global model training. The proposed PoD protocol includes data verification and robust aggregation mechanisms specifically to resist this threat, matching the BFT-FL defense pattern explicitly called out under ML02.

Details

Domains

federated-learning

Model Types

federated

Threat Tags

training_timegrey_box

Applications

2025 0 cit.

Data Poisoning Attack

100%