FedOnco-Bench: A Reproducible Benchmark for Privacy-Aware Federated Tumor Segmentation with Synthetic CT Data
Viswa Chaitanya Marella , Suhasnadh Reddy Veluru , Sai Teja Erukude
Published on arXiv
2511.00795
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
FedAvg achieves Dice ~0.85 but MIA AUC ~0.72, whereas DP-SGD reduces MIA AUC to ~0.25 at the cost of Dice ~0.79, demonstrating a clear privacy-utility tradeoff in federated tumor segmentation.
FedOnco-Bench
Novel technique introduced
Federated Learning (FL) allows multiple institutions to cooperatively train machine learning models while retaining sensitive data at the source, which has great utility in privacy-sensitive environments. However, FL systems remain vulnerable to membership-inference attacks and data heterogeneity. This paper presents FedOnco-Bench, a reproducible benchmark for privacy-aware FL using synthetic oncologic CT scans with tumor annotations. It evaluates segmentation performance and privacy leakage across FL methods: FedAvg, FedProx, FedBN, and FedAvg with DP-SGD. Results show a distinct trade-off between privacy and utility: FedAvg is high performance (Dice around 0.85) with more privacy leakage (attack AUC about 0.72), while DP-SGD provides a higher level of privacy (AUC around 0.25) at the cost of accuracy (Dice about 0.79). FedProx and FedBN offer balanced performance under heterogeneous data, especially with non-identical distributed client data. FedOnco-Bench serves as a standardized, open-source platform for benchmarking and developing privacy-preserving FL methods for medical image segmentation.
Key Contributions
- Synthetic non-IID oncologic CT dataset distributed across simulated FL clients to replicate realistic clinical heterogeneity
- Standardized evaluation of FedAvg, FedProx, FedBN, and FedAvg+DP-SGD on both segmentation quality (Dice) and privacy leakage (MIA AUC)
- Quantified privacy-utility tradeoff: DP-SGD cuts MIA AUC from 0.72 to 0.25 at a cost of ~6 Dice points, while FedProx/FedBN offer balance under non-IID data
🛡️ Threat Analysis
Core evaluation metric is membership inference attack AUC across FL methods — FedAvg leaks at AUC ~0.72 while DP-SGD reduces it to ~0.25. The adversarial threat model is explicit: an attacker inferring whether specific patient data was in training.