Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model

Reliable uncertainty quantification remains a major obstacle to the deployment of deep learning models under distributional shift. Existing post-hoc approaches that retrofit pretrained models either inherit misplaced confidence or merely reshape predictions, without teaching the model when to be uncertain. We introduce GUIDE, a lightweight evidential learning meta-model approach that attaches to a frozen deep learning model and explicitly learns how and when to be uncertain. GUIDE identifies salient internal features via a calibration stage, and then employs these features to construct a noise-driven curriculum that teaches the model how and when to express uncertainty. GUIDE requires no retraining, no architectural modifications, and no manual intermediate-layer selection to the base deep learning model, thus ensuring broad applicability and minimal user intervention. The resulting model avoids distilling overconfidence from the base model, improves out-of-distribution detection by ~77% and adversarial attack detection by ~80%, while preserving in-distribution performance. Across diverse benchmarks, GUIDE consistently outperforms state-of-the-art approaches, evidencing the need for actively guiding uncertainty to close the gap between predictive confidence and reliability.

Key Contributions

GUIDE: first fully post-hoc meta-model using saliency calibration and noise-driven curriculum to explicitly teach a frozen model when and how to be uncertain
Theoretical guarantees for soundness and convergence of the evidential meta-model
~80% improvement in adversarial attack detection and ~77% improvement in OOD detection across gradient- and non-gradient-based attacks at multiple perturbation strengths

🛡️ Threat Analysis

Input Manipulation Attack

GUIDE is explicitly evaluated as a defense that detects adversarial inputs (gradient- and non-gradient-based attacks) via uncertainty estimation — improving adversarial attack detection by ~80% is a primary quantified headline result, making this a concrete adversarial defense contribution.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxblack_boxinference_timedigital

Datasets

CIFAR-10SVHN

Applications

2025 0 cit.

Input Manipulation Attack

92%