defense 2025

Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model

Charmaine Barker , Daniel Bethell , Simos Gerasimou

1 citations · 39 references · arXiv

α

Published on arXiv

2509.24492

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

GUIDE improves adversarial attack detection by ~80% and OOD detection by ~77% over state-of-the-art post-hoc uncertainty quantification baselines without modifying the base model.

GUIDE

Novel technique introduced


Reliable uncertainty quantification remains a major obstacle to the deployment of deep learning models under distributional shift. Existing post-hoc approaches that retrofit pretrained models either inherit misplaced confidence or merely reshape predictions, without teaching the model when to be uncertain. We introduce GUIDE, a lightweight evidential learning meta-model approach that attaches to a frozen deep learning model and explicitly learns how and when to be uncertain. GUIDE identifies salient internal features via a calibration stage, and then employs these features to construct a noise-driven curriculum that teaches the model how and when to express uncertainty. GUIDE requires no retraining, no architectural modifications, and no manual intermediate-layer selection to the base deep learning model, thus ensuring broad applicability and minimal user intervention. The resulting model avoids distilling overconfidence from the base model, improves out-of-distribution detection by ~77% and adversarial attack detection by ~80%, while preserving in-distribution performance. Across diverse benchmarks, GUIDE consistently outperforms state-of-the-art approaches, evidencing the need for actively guiding uncertainty to close the gap between predictive confidence and reliability.


Key Contributions

  • GUIDE: first fully post-hoc meta-model using saliency calibration and noise-driven curriculum to explicitly teach a frozen model when and how to be uncertain
  • Theoretical guarantees for soundness and convergence of the evidential meta-model
  • ~80% improvement in adversarial attack detection and ~77% improvement in OOD detection across gradient- and non-gradient-based attacks at multiple perturbation strengths

🛡️ Threat Analysis

Input Manipulation Attack

GUIDE is explicitly evaluated as a defense that detects adversarial inputs (gradient- and non-gradient-based attacks) via uncertainty estimation — improving adversarial attack detection by ~80% is a primary quantified headline result, making this a concrete adversarial defense contribution.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxblack_boxinference_timedigital
Datasets
CIFAR-10SVHN
Applications
image classification