Formal Reasoning About Confidence and Automated Verification of Neural Networks

In the last decade, a large body of work has emerged on robustness of neural networks, i.e., checking if the decision remains unchanged when the input is slightly perturbed. However, most of these approaches ignore the confidence of a neural network on its output. In this work, we aim to develop a generalized framework for formally reasoning about the confidence along with robustness in neural networks. We propose a simple yet expressive grammar that captures various confidence-based specifications. We develop a novel and unified technique to verify all instances of the grammar in a homogeneous way, viz., by adding a few additional layers to the neural network, which enables the use any state-of-the-art neural network verification tool. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks, where the largest network has 138M parameters, and show that this outperforms ad-hoc encoding approaches by a significant margin.

Key Contributions

A simple yet expressive grammar that captures diverse confidence-based robustness specifications in a unified way
A novel verification technique that encodes all grammar instances by appending auxiliary layers to the neural network, enabling plug-in use of any existing verifier
Extensive evaluation on 8870 benchmarks (up to 138M parameters) demonstrating significant improvement over ad-hoc encoding approaches

🛡️ Threat Analysis

Input Manipulation Attack

The paper is a defense/verification framework for adversarial robustness — formally certifying that a neural network's decision (and confidence) remains unchanged under input perturbations, which directly addresses the Input Manipulation Attack threat.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_time

Datasets

8870 verification benchmarks (up to 138M parameter networks)

Applications

2025 1 cit.

Input Manipulation Attack

83%