benchmark 2025

Adversarially-Aware Architecture Design for Robust Medical AI Systems

Alyssa Gerhart , Balaji Iyangar

1 citations · 11 references · arXiv

α

Published on arXiv

2510.23622

Input Manipulation Attack

OWASP ML Top 10 — ML01

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Adversarial attacks (FGSM, PGD, Nightshade) significantly degrade classification accuracy on dermatological images, while adversarial training and distillation partially recover robustness at the cost of clean-data performance.


Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.


Key Contributions

  • Empirical evaluation of adversarial evasion attacks (FGSM, PGD) and data poisoning (Nightshade) on a dermatological skin cancer dataset (ISIC)
  • Benchmarking of defenses—adversarial training, defensive distillation, and hybrid methods—against both attack types in a medical imaging context
  • Characterization of the robustness-accuracy trade-off and a call for integrated technical, ethical, and policy-based approaches for medical AI resilience

🛡️ Threat Analysis

Input Manipulation Attack

Central empirical focus includes FGSM and PGD evasion attacks at inference time on dermatological image classifiers, plus evaluation of adversarial training and distillation as defenses.

Data Poisoning Attack

Explicitly experiments with Nightshade to inject poisoned training samples into the ISIC dataset, corrupting the training pipeline to induce misclassification—core data poisoning.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxtraining_timeinference_timetargeteddigital
Datasets
ISIC
Applications
medical imagingdermatologyskin cancer classification