Towards Adversarial Robustness and Uncertainty Quantification in DINOv2-based Few-Shot Anomaly Detection

Foundation models such as DINOv2 have shown strong performance in few-shot anomaly detection, yet two key questions remain unexamined: (i) how susceptible are these detectors to adversarial perturbations; and (ii) how well do their anomaly scores reflect calibrated uncertainty? Building on AnomalyDINO, a training-free deep nearest-neighbor detector over DINOv2 features, we present one of the first systematic studies of adversarial attacks and uncertainty estimation in this setting. To enable white-box gradient attacks while preserving test-time behavior, we attach a lightweight linear head to frozen DINOv2 features only for crafting perturbations. Using this heuristic, we evaluate the impact of FGSM across the MVTec-AD and VisA datasets and observe consistent drops in F1, AUROC, AP, and G-mean, indicating that imperceptible perturbations can flip nearest-neighbor relations in feature space to induce confident misclassification. Complementing robustness, we probe reliability and find that raw anomaly scores are poorly calibrated, revealing a gap between confidence and correctness that limits safety-critical use. As a simple, strong baseline toward trustworthiness, we apply post-hoc Platt scaling to the anomaly scores for uncertainty estimation. The resulting calibrated posteriors yield significantly higher predictive entropy on adversarially perturbed inputs than on clean ones, enabling a practical flagging mechanism for attack detection while reducing calibration error (ECE). Our findings surface concrete vulnerabilities in DINOv2-based few-shot anomaly detectors and establish an evaluation protocol and baseline for robust, uncertainty-aware anomaly detection. We argue that adversarial robustness and principled uncertainty quantification are not optional add-ons but essential capabilities if anomaly detection systems are to be trustworthy and ready for real-world deployment.

Key Contributions

A lightweight linear head heuristic attached to frozen DINOv2 features to enable white-box gradient-based (FGSM) attacks on training-free nearest-neighbor anomaly detectors without altering test-time behavior.
Systematic evaluation showing imperceptible FGSM perturbations cause consistent drops in F1, AUROC, AP, and G-mean across MVTec-AD and VisA, demonstrating that DINOv2-based few-shot anomaly detectors are vulnerable to adversarial evasion.
Post-hoc Platt scaling baseline that calibrates raw anomaly scores, reducing ECE and producing significantly higher predictive entropy on adversarially perturbed inputs, enabling a practical attack-detection flagging mechanism.

🛡️ Threat Analysis

Input Manipulation Attack

The paper crafts FGSM adversarial perturbations that manipulate nearest-neighbor relations in DINOv2 feature space to cause misclassification at inference time, and proposes calibrated uncertainty (Platt scaling + predictive entropy) as a practical flagging mechanism to detect these attacks — both the attack evaluation and the uncertainty-based defense are squarely within Input Manipulation Attack territory.

Details

Domains

vision

Model Types

transformer

Threat Tags

white_boxinference_timedigital

Datasets

MVTec-ADVisA

Applications

2025 0 cit.

Input Manipulation Attack

82%