survey 2026

SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations

Mohoshin Ara Tahera ¹, Karamveer Singh Sidhu ², Shuvalaxmi Dass ¹, Sajal Saha ²

¹ University of Louisiana at Lafayette

² University of Northern British Columbia

0 citations · 90 references · arXiv

Published on arXiv

2601.10004

Model Inversion Attack

OWASP ML Top 10 — ML03

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Identifies persistent limitations in existing privacy-preserving techniques for healthcare LLMs and provides a phase-aware roadmap (preprocessing → fine-tuning → inference) for securing sensitive clinical data across diverse operational tiers.

Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs), and enhance patient care. However, this integration introduces significant privacy and security challenges, driven by the sensitivity of clinical data and the high-stakes nature of medical workflows. These risks become even more pronounced across heterogeneous deployment environments, ranging from small on-premise hospital systems to regional health networks, each with unique resource limitations and regulatory demands. This Systematization of Knowledge (SoK) examines the evolving threat landscape across the three core LLM phases: Data preprocessing, Fine-tuning, and Inference within realistic healthcare settings. We present a detailed threat model that characterizes adversaries, capabilities, and attack surfaces at each phase, and we systematize how existing privacy-preserving techniques (PPTs) attempt to mitigate these vulnerabilities. While existing defenses show promise, our analysis identifies persistent limitations in securing sensitive clinical data across diverse operational tiers. We conclude with phase-aware recommendations and future research directions aimed at strengthening privacy guarantees for LLMs in regulated environments. This work provides a foundation for understanding the intersection of LLMs, threats, and privacy in healthcare, offering a roadmap toward more robust and clinically trustworthy AI systems.

Key Contributions

Phase-aligned threat model mapping adversaries, attack surfaces, and enabled attacks across data preprocessing, fine-tuning, and inference for healthcare LLMs
Systematization of privacy-preserving techniques (differential privacy, federated learning, secure aggregation, etc.) against specific threats at each LLM phase
Identification of persistent gaps and phase-aware recommendations for securing LLMs in heterogeneous healthcare deployment environments

🛡️ Threat Analysis

Model Inversion Attack

Covers training data reconstruction and gradient leakage attacks relevant to healthcare settings, including federated learning scenarios where adversaries attempt to recover PHI from shared gradients or model outputs.

Membership Inference Attack

Membership inference against clinical LLMs is explicitly treated as a key threat — adversaries determining whether specific patient records were in training data — with associated defenses systematized.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timeinference_timeblack_boxwhite_box

Applications

clinical decision supportelectronic health record summarizationclinical documentationpatient-facing chatbots

Read PDF arXiv DOI

SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting

Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models

The Model's Language Matters: A Comparative Privacy Analysis of LLMs

SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems

A Survey: Towards Privacy and Security in Mobile Large Language Models

Security Analysis of ChatGPT: Threats and Privacy Risks

Enterprise AI Must Enforce Participant-Aware Access Control

Membership and Memorization in LLM Knowledge Distillation