Probably Approximately Global Robustness Certification
Peter Blohm , Patrick Indri , Thomas Gärtner , Sagar Malhotra
Published on arXiv
2511.06495
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Empirically characterizes global robustness better than state-of-the-art sampling-based approaches while scaling to larger networks than formal verification methods
Probably Approximately Global Robustness (PAGR) Certification
Novel technique introduced
We propose and investigate probabilistic guarantees for the adversarial robustness of classification algorithms. While traditional formal verification approaches for robustness are intractable and sampling-based approaches do not provide formal guarantees, our approach is able to efficiently certify a probabilistic relaxation of robustness. The key idea is to sample an $ε$-net and invoke a local robustness oracle on the sample. Remarkably, the size of the sample needed to achieve probably approximately global robustness guarantees is independent of the input dimensionality, the number of classes, and the learning algorithm itself. Our approach can, therefore, be applied even to large neural networks that are beyond the scope of traditional formal verification. Experiments empirically confirm that it characterizes robustness better than state-of-the-art sampling-based approaches and scales better than formal methods.
Key Contributions
- Probabilistic global robustness certification framework using ε-net sampling, parameterized by prediction confidence
- Sample size bounds that are independent of input dimensionality, number of classes, and learning algorithm — enabling application to large NNs
- Oracle-agnostic approach compatible with both formal verification and adversarial attack methods as local robustness checkers
🛡️ Threat Analysis
The paper proposes a global robustness certification framework against adversarial perturbations — a direct defense against input manipulation attacks. It invokes local robustness oracles (FGSM, PGD, C&W, formal verifiers) to provide high-probability guarantees that classifiers are not vulnerable to adversarial examples across the input space.