Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model
Charmaine Barker , Daniel Bethell , Simos Gerasimou
Published on arXiv
2509.24492
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
GUIDE improves adversarial attack detection by ~80% and OOD detection by ~77% over state-of-the-art post-hoc uncertainty quantification baselines without modifying the base model.
GUIDE
Novel technique introduced
Reliable uncertainty quantification remains a major obstacle to the deployment of deep learning models under distributional shift. Existing post-hoc approaches that retrofit pretrained models either inherit misplaced confidence or merely reshape predictions, without teaching the model when to be uncertain. We introduce GUIDE, a lightweight evidential learning meta-model approach that attaches to a frozen deep learning model and explicitly learns how and when to be uncertain. GUIDE identifies salient internal features via a calibration stage, and then employs these features to construct a noise-driven curriculum that teaches the model how and when to express uncertainty. GUIDE requires no retraining, no architectural modifications, and no manual intermediate-layer selection to the base deep learning model, thus ensuring broad applicability and minimal user intervention. The resulting model avoids distilling overconfidence from the base model, improves out-of-distribution detection by ~77% and adversarial attack detection by ~80%, while preserving in-distribution performance. Across diverse benchmarks, GUIDE consistently outperforms state-of-the-art approaches, evidencing the need for actively guiding uncertainty to close the gap between predictive confidence and reliability.
Key Contributions
- GUIDE: first fully post-hoc meta-model using saliency calibration and noise-driven curriculum to explicitly teach a frozen model when and how to be uncertain
- Theoretical guarantees for soundness and convergence of the evidential meta-model
- ~80% improvement in adversarial attack detection and ~77% improvement in OOD detection across gradient- and non-gradient-based attacks at multiple perturbation strengths
🛡️ Threat Analysis
GUIDE is explicitly evaluated as a defense that detects adversarial inputs (gradient- and non-gradient-based attacks) via uncertainty estimation — improving adversarial attack detection by ~80% is a primary quantified headline result, making this a concrete adversarial defense contribution.