Universal and Transferable Attacks on Pathology Foundation Models
Yuntian Wang 1, Xilin Yang 1, Che-Yung Shen 1, Nir Pillar 2, Aydogan Ozcan 1
Published on arXiv
2510.16660
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
A single visually imperceptible fixed noise pattern causes significant performance drops across multiple state-of-the-art pathology foundation models, including black-box models unseen during attack optimization.
UTAP (Universal and Transferable Adversarial Perturbations)
Novel technique introduced
We introduce Universal and Transferable Adversarial Perturbations (UTAP) for pathology foundation models that reveal critical vulnerabilities in their capabilities. Optimized using deep learning, UTAP comprises a fixed and weak noise pattern that, when added to a pathology image, systematically disrupts the feature representation capabilities of multiple pathology foundation models. Therefore, UTAP induces performance drops in downstream tasks that utilize foundation models, including misclassification across a wide range of unseen data distributions. In addition to compromising the model performance, we demonstrate two key features of UTAP: (1) universality: its perturbation can be applied across diverse field-of-views independent of the dataset that UTAP was developed on, and (2) transferability: its perturbation can successfully degrade the performance of various external, black-box pathology foundation models - never seen before. These two features indicate that UTAP is not a dedicated attack associated with a specific foundation model or image dataset, but rather constitutes a broad threat to various emerging pathology foundation models and their applications. We systematically evaluated UTAP across various state-of-the-art pathology foundation models on multiple datasets, causing a significant drop in their performance with visually imperceptible modifications to the input images using a fixed noise pattern. The development of these potent attacks establishes a critical, high-standard benchmark for model robustness evaluation, highlighting a need for advancing defense mechanisms and potentially providing the necessary assets for adversarial training to ensure the safe and reliable deployment of AI in pathology.
Key Contributions
- Universal adversarial perturbation (UTAP) that applies a single fixed noise pattern to disrupt feature representations of multiple pathology foundation models across diverse field-of-views and datasets
- Demonstrates black-box transferability: UTAP degrades external pathology foundation models never seen during optimization
- Systematic robustness benchmark across state-of-the-art pathology foundation models, establishing a high standard for evaluating and motivating defenses
🛡️ Threat Analysis
UTAP is a gradient-optimized universal adversarial perturbation applied to pathology images at inference time, causing misclassification and feature disruption across multiple unseen foundation models — a classic input manipulation attack with universality and black-box transferability.