D2M: A Decentralized, Privacy-Preserving, Incentive-Compatible Data Marketplace for Collaborative Learning
Yash Srivastava 1, Shalin Jain 1, Sneha Awathare 2, Nitin Awathare 1
Published on arXiv
2512.10372
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Achieves less than 3% accuracy degradation on MNIST and Fashion-MNIST with up to 30% Byzantine nodes, demonstrating robust aggregation under significant adversarial participation.
Corrected OSMD
Novel technique introduced
The rising demand for collaborative machine learning and data analytics calls for secure and decentralized data sharing frameworks that balance privacy, trust, and incentives. Existing approaches, including federated learning (FL) and blockchain-based data markets, fall short: FL often depends on trusted aggregators and lacks Byzantine robustness, while blockchain frameworks struggle with computation-intensive training and incentive integration. We present \prot, a decentralized data marketplace that unifies federated learning, blockchain arbitration, and economic incentives into a single framework for privacy-preserving data sharing. \prot\ enables data buyers to submit bid-based requests via blockchain smart contracts, which manage auctions, escrow, and dispute resolution. Computationally intensive training is delegated to \cone\ (\uline{Co}mpute \uline{N}etwork for \uline{E}xecution), an off-chain distributed execution layer. To safeguard against adversarial behavior, \prot\ integrates a modified YODA protocol with exponentially growing execution sets for resilient consensus, and introduces Corrected OSMD to mitigate malicious or low-quality contributions from sellers. All protocols are incentive-compatible, and our game-theoretic analysis establishes honesty as the dominant strategy. We implement \prot\ on Ethereum and evaluate it over benchmark datasets -- MNIST, Fashion-MNIST, and CIFAR-10 -- under varying adversarial settings. \prot\ achieves up to 99\% accuracy on MNIST and 90\% on Fashion-MNIST, with less than 3\% degradation up to 30\% Byzantine nodes, and 56\% accuracy on CIFAR-10 despite its complexity. Our results show that \prot\ ensures privacy, maintains robustness under adversarial conditions, and scales efficiently with the number of participants, making it a practical foundation for real-world decentralized data sharing.
Key Contributions
- Corrected OSMD robust aggregation algorithm that mitigates malicious or low-quality contributor updates in federated learning
- Decentralized marketplace (D2M) combining blockchain smart contracts, off-chain compute (CONE), and Byzantine-fault-tolerant FL with incentive compatibility
- Game-theoretic analysis proving honesty is the dominant strategy for all participants under the proposed incentive mechanism
🛡️ Threat Analysis
The paper's primary ML security contribution is Byzantine-fault-tolerant aggregation via 'Corrected OSMD', which mitigates malicious or low-quality model updates from sellers/participants in a federated learning setting. The system is evaluated under up to 30% Byzantine nodes, confirming the adversarial threat model is central to the design.