Formal Reasoning About Confidence and Automated Verification of Neural Networks
Mohammad Afzal 1,2, S. Akshay 1, Blaise Genest 3, Ashutosh Gupta 1
Published on arXiv
2511.07293
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
The layer-augmentation approach outperforms ad-hoc encoding on 8870 benchmarks while supporting a strictly broader class of confidence-based robustness specifications
In the last decade, a large body of work has emerged on robustness of neural networks, i.e., checking if the decision remains unchanged when the input is slightly perturbed. However, most of these approaches ignore the confidence of a neural network on its output. In this work, we aim to develop a generalized framework for formally reasoning about the confidence along with robustness in neural networks. We propose a simple yet expressive grammar that captures various confidence-based specifications. We develop a novel and unified technique to verify all instances of the grammar in a homogeneous way, viz., by adding a few additional layers to the neural network, which enables the use any state-of-the-art neural network verification tool. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks, where the largest network has 138M parameters, and show that this outperforms ad-hoc encoding approaches by a significant margin.
Key Contributions
- A simple yet expressive grammar that captures diverse confidence-based robustness specifications in a unified way
- A novel verification technique that encodes all grammar instances by appending auxiliary layers to the neural network, enabling plug-in use of any existing verifier
- Extensive evaluation on 8870 benchmarks (up to 138M parameters) demonstrating significant improvement over ad-hoc encoding approaches
🛡️ Threat Analysis
The paper is a defense/verification framework for adversarial robustness — formally certifying that a neural network's decision (and confidence) remains unchanged under input perturbations, which directly addresses the Input Manipulation Attack threat.