Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs

The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the ethical design of releasing these "vulnerability signatures" to correct the information asymmetry between malicious actors and AI developers. Ultimately, this work advocates for a shift from universal to context-aware safety, providing the necessary resources to immunize healthcare AI against the nuanced, systemic threats inherent to high-stakes medical environments -- vulnerabilities that represent the paramount risk to patient safety and the successful integration of AI in healthcare systems.

Key Contributions

Medical Malice dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical context of Brazil's Unified Health System (SUS), covering seven violation taxonomies including procurement fraud, queue-jumping, and obstetric violence
Adversarial generation pipeline using an unaligned agent (Grok-4) with persona-driven prompting to synthesize high-fidelity, context-specific healthcare threats at scale
Ethical framework and rationale for releasing 'vulnerability signatures' to correct information asymmetry between malicious actors and AI developers, advocating a shift from universal to context-aware LLM safety