Proof-of-Data: A Consensus Protocol for Collaborative Intelligence
Huiwen Liu , Feida Zhu , Ling Cheng
Published on arXiv
2501.02971
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
PoD achieves model training performance close to centralized federated learning while tolerating up to 1/3 Byzantine nodes and ensuring fair reward allocation without a central coordinator.
Proof-of-Data (PoD)
Novel technique introduced
Existing research on federated learning has been focused on the setting where learning is coordinated by a centralized entity. Yet the greatest potential of future collaborative intelligence would be unleashed in a more open and democratized setting with no central entity in a dominant role, referred to as "decentralized federated learning". New challenges arise accordingly in achieving both correct model training and fair reward allocation with collective effort among all participating nodes, especially with the threat of the Byzantine node jeopardising both tasks. In this paper, we propose a blockchain-based decentralized Byzantine fault-tolerant federated learning framework based on a novel Proof-of-Data (PoD) consensus protocol to resolve both the "trust" and "incentive" components. By decoupling model training and contribution accounting, PoD is able to enjoy not only the benefit of learning efficiency and system liveliness from asynchronous societal-scale PoW-style learning but also the finality of consensus and reward allocation from epoch-based BFT-style voting. To mitigate false reward claims by data forgery from Byzantine attacks, a privacy-aware data verification and contribution-based reward allocation mechanism is designed to complete the framework. Our evaluation results show that PoD demonstrates performance in model training close to that of the centralized counterpart while achieving trust in consensus and fairness for reward allocation with a fault tolerance ratio of 1/3.
Key Contributions
- Proof-of-Data (PoD) consensus protocol that decouples model training from contribution accounting, combining asynchronous PoW-style learning with epoch-based BFT voting for finality
- Privacy-aware data verification mechanism to detect and mitigate false reward claims from Byzantine data forgery attacks
- Byzantine-fault-tolerant decentralized FL framework achieving 1/3 fault tolerance ratio with model accuracy comparable to centralized FL
🛡️ Threat Analysis
The paper's security contribution is defending against Byzantine nodes in decentralized federated learning — malicious participants who submit forged data or corrupt model updates to degrade global model training. The proposed PoD protocol includes data verification and robust aggregation mechanisms specifically to resist this threat, matching the BFT-FL defense pattern explicitly called out under ML02.