benchmark 2025

Formal Reasoning About Confidence and Automated Verification of Neural Networks

Mohammad Afzal 1,2, S. Akshay 1, Blaise Genest 3, Ashutosh Gupta 1

0 citations · 46 references · arXiv (Cornell University)

α

Published on arXiv

2511.07293

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

The layer-augmentation approach outperforms ad-hoc encoding on 8870 benchmarks while supporting a strictly broader class of confidence-based robustness specifications


In the last decade, a large body of work has emerged on robustness of neural networks, i.e., checking if the decision remains unchanged when the input is slightly perturbed. However, most of these approaches ignore the confidence of a neural network on its output. In this work, we aim to develop a generalized framework for formally reasoning about the confidence along with robustness in neural networks. We propose a simple yet expressive grammar that captures various confidence-based specifications. We develop a novel and unified technique to verify all instances of the grammar in a homogeneous way, viz., by adding a few additional layers to the neural network, which enables the use any state-of-the-art neural network verification tool. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks, where the largest network has 138M parameters, and show that this outperforms ad-hoc encoding approaches by a significant margin.


Key Contributions

  • A simple yet expressive grammar that captures diverse confidence-based robustness specifications in a unified way
  • A novel verification technique that encodes all grammar instances by appending auxiliary layers to the neural network, enabling plug-in use of any existing verifier
  • Extensive evaluation on 8870 benchmarks (up to 138M parameters) demonstrating significant improvement over ad-hoc encoding approaches

🛡️ Threat Analysis

Input Manipulation Attack

The paper is a defense/verification framework for adversarial robustness — formally certifying that a neural network's decision (and confidence) remains unchanged under input perturbations, which directly addresses the Input Manipulation Attack threat.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_time
Datasets
8870 verification benchmarks (up to 138M parameter networks)
Applications
neural network robustness verificationimage classification