D2M: A Decentralized, Privacy-Preserving, Incentive-Compatible Data Marketplace for Collaborative Learning

The rising demand for collaborative machine learning and data analytics calls for secure and decentralized data sharing frameworks that balance privacy, trust, and incentives. Existing approaches, including federated learning (FL) and blockchain-based data markets, fall short: FL often depends on trusted aggregators and lacks Byzantine robustness, while blockchain frameworks struggle with computation-intensive training and incentive integration. We present \prot, a decentralized data marketplace that unifies federated learning, blockchain arbitration, and economic incentives into a single framework for privacy-preserving data sharing. \prot\ enables data buyers to submit bid-based requests via blockchain smart contracts, which manage auctions, escrow, and dispute resolution. Computationally intensive training is delegated to \cone\ (\uline{Co}mpute \uline{N}etwork for \uline{E}xecution), an off-chain distributed execution layer. To safeguard against adversarial behavior, \prot\ integrates a modified YODA protocol with exponentially growing execution sets for resilient consensus, and introduces Corrected OSMD to mitigate malicious or low-quality contributions from sellers. All protocols are incentive-compatible, and our game-theoretic analysis establishes honesty as the dominant strategy. We implement \prot\ on Ethereum and evaluate it over benchmark datasets -- MNIST, Fashion-MNIST, and CIFAR-10 -- under varying adversarial settings. \prot\ achieves up to 99\% accuracy on MNIST and 90\% on Fashion-MNIST, with less than 3\% degradation up to 30\% Byzantine nodes, and 56\% accuracy on CIFAR-10 despite its complexity. Our results show that \prot\ ensures privacy, maintains robustness under adversarial conditions, and scales efficiently with the number of participants, making it a practical foundation for real-world decentralized data sharing.

Key Contributions

Corrected OSMD robust aggregation algorithm that mitigates malicious or low-quality contributor updates in federated learning
Decentralized marketplace (D2M) combining blockchain smart contracts, off-chain compute (CONE), and Byzantine-fault-tolerant FL with incentive compatibility
Game-theoretic analysis proving honesty is the dominant strategy for all participants under the proposed incentive mechanism

🛡️ Threat Analysis

Data Poisoning Attack

The paper's primary ML security contribution is Byzantine-fault-tolerant aggregation via 'Corrected OSMD', which mitigates malicious or low-quality model updates from sellers/participants in a federated learning setting. The system is evaluated under up to 30% Byzantine nodes, confirming the adversarial threat model is central to the design.

Details

Domains

federated-learning

Model Types

federated

Threat Tags

training_timegrey_box

Datasets

MNISTFashion-MNISTCIFAR-10

Applications

2026 0 cit.

Data Poisoning Attack

100%