benchmark 2025

An Investigation of Memorization Risk in Healthcare Foundation Models

Sana Tonekaboni 1,2,3, Lena Stempfle 1,4,5, Adibvafa Fallahpour 6,3,7, Walter Gerych 8, Marzyeh Ghassemi 1

1 citations · 57 references · arXiv

α

Published on arXiv

2510.12950

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Demonstrates extractable memorization of patient information from a publicly available EHR foundation model, with elevated privacy risk identified for vulnerable patient subgroups.


Foundation models trained on large-scale de-identified electronic health records (EHRs) hold promise for clinical applications. However, their capacity to memorize patient information raises important privacy concerns. In this work, we introduce a suite of black-box evaluation tests to assess privacy-related memorization risks in foundation models trained on structured EHR data. Our framework includes methods for probing memorization at both the embedding and generative levels, and aims to distinguish between model generalization and harmful memorization in clinically relevant settings. We contextualize memorization in terms of its potential to compromise patient privacy, particularly for vulnerable subgroups. We validate our approach on a publicly available EHR foundation model and release an open-source toolkit to facilitate reproducible and collaborative privacy assessments in healthcare AI.


Key Contributions

  • Suite of black-box tests for probing memorization at both embedding and generative levels in EHR foundation models
  • Risk evaluation framework contextualizing memorization risk by patient vulnerability and clinical relevance
  • Open-source toolkit for reproducible privacy assessments of healthcare AI foundation models

🛡️ Threat Analysis

Model Inversion Attack

The paper's core contribution is measuring how much private patient training data can be extracted from EHR foundation models via black-box probing of embeddings and generative outputs — a direct training data extraction/model inversion threat where an adversary recovers information the model was trained on.


Details

Domains
tabular
Model Types
transformer
Threat Tags
black_boxinference_time
Datasets
de-identified EHR (structured electronic health records)
Applications
electronic health recordsclinical aihealthcare foundation models