Adversarial Evasion Attacks on Computer Vision using SHAP Values
Frank Mollard 1, Marcus Becker 2, Florian Roehrbein 3
Published on arXiv
2601.10587
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
SHAP-based adversarial attacks generate more robust misclassifications than FGSM, particularly in gradient hiding scenarios where gradient-based attacks typically fail.
SHAP Attack
Novel technique introduced
The paper introduces a white-box attack on computer vision models using SHAP values. It demonstrates how adversarial evasion attacks can compromise the performance of deep learning models by reducing output confidence or inducing misclassifications. Such attacks are particularly insidious as they can deceive the perception of an algorithm while eluding human perception due to their imperceptibility to the human eye. The proposed attack leverages SHAP values to quantify the significance of individual inputs to the output at the inference stage. A comparison is drawn between the SHAP attack and the well-known Fast Gradient Sign Method. We find evidence that SHAP attacks are more robust in generating misclassifications particularly in gradient hiding scenarios.
Key Contributions
- Introduces a white-box adversarial evasion attack using SHAP values to identify and perturb the most influential input pixels/features
- Demonstrates that SHAP-based attacks are more effective than FGSM in gradient hiding scenarios
- Provides comparative empirical evaluation of SHAP attacks versus FGSM on deep learning computer vision models
🛡️ Threat Analysis
Directly proposes a white-box adversarial evasion attack that crafts imperceptible perturbations to cause misclassification at inference time, using SHAP values to identify and manipulate the most influential input features — a classic adversarial example attack.