Adversarially-Aware Architecture Design for Robust Medical AI Systems

Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.

Key Contributions

Empirical evaluation of adversarial evasion attacks (FGSM, PGD) and data poisoning (Nightshade) on a dermatological skin cancer dataset (ISIC)
Benchmarking of defenses—adversarial training, defensive distillation, and hybrid methods—against both attack types in a medical imaging context
Characterization of the robustness-accuracy trade-off and a call for integrated technical, ethical, and policy-based approaches for medical AI resilience

🛡️ Threat Analysis

Input Manipulation Attack

Central empirical focus includes FGSM and PGD evasion attacks at inference time on dermatological image classifiers, plus evaluation of adversarial training and distillation as defenses.

Data Poisoning Attack

Explicitly experiments with Nightshade to inject poisoned training samples into the ISIC dataset, corrupting the training pipeline to induce misclassification—core data poisoning.