benchmark 2025

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Jinghao Wang , Ping Zhang , Carter Yagemann

The Ohio State University

0 citations · 42 references · arXiv

Published on arXiv

2512.08185

Prompt Injection

OWASP LLM Top 10 — LLM01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Framework specification (threat models, synthetic data methodology, evaluation protocols, scoring rubrics) enabling reproducible security benchmarking of medical LLMs without GPU clusters, commercial API access, or protected health data.

Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.

Key Contributions

Multi-specialty threat model stratified by clinical risk (emergency medicine, psychiatry, general practice) covering both jailbreaking and privacy extraction attack scenarios
Accessible evaluation design that runs on consumer CPU hardware using freely available models and synthetic patient records requiring no IRB approval
Standardized evaluation protocol with scoring rubrics adapted from established security research for comparative assessment of medical LLM defenses

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Applications

clinical decision supportmedical question answeringmedical llms

Read PDF arXiv DOI

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction

Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Quantifying Return on Security Controls in LLM Systems

Evaluating Language Model Reasoning about Confidential Information

Benchmarking LLAMA Model Security Against OWASP Top 10 For LLM Applications

In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b