A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models
Yash Deo 1, Yan Jia 1, Toni Lassila 2, Victoria J Hodge 1, Alejandro F Frangi 3,4, Chenghao Qian 2, Siyuan Kang 5, Ibrahim Habli 1
Published on arXiv
2602.13066
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Achieves near-perfect sample-level detection of training data duplicates across three MRI datasets, outperforming generic metrics (FID, CT-score, authenticity) that fail under medical image characteristics and augmentations
Memorization Index (MI) / Overfit-Novelty Index (ONI)
Novel technique introduced
Image generative models are known to duplicate images from the training data as part of their outputs, which can lead to privacy concerns when used for medical image generation. We propose a calibrated per-sample metric for detecting memorization and duplication of training data. Our metric uses image features extracted using an MRI foundation model, aggregates multi-layer whitened nearest-neighbor similarities, and maps them to a bounded \emph{Overfit/Novelty Index} (ONI) and \emph{Memorization Index} (MI) scores. Across three MRI datasets with controlled duplication percentages and typical image augmentations, our metric robustly detects duplication and provides more consistent metric values across datasets. At the sample level, our metric achieves near-perfect detection of duplicates.
Key Contributions
- Calibrated per-sample Memorization Index (MI) and Overfit/Novelty Index (ONI) metrics mapped to a bounded [0,1] range for cross-dataset comparability
- Multi-scale feature aggregation using an MRI-specific foundation model (MRI-CORE) with whitened nearest-neighbor similarities to capture both fine-grained textures and gross anatomy
- Validation across three MRI datasets (brain, knee, spine) demonstrating near-perfect duplicate detection robust to common augmentations such as noise, flips, and small rotations
🛡️ Threat Analysis
Paper targets training data memorization and leakage from generative models — detecting when a model's outputs reproduce private training samples (patient MRI images). While framed as an auditing metric rather than an adversarial attack, the threat model is training data privacy leakage from model outputs, which is the core concern of ML03.