defense 2025

Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR

Masako Kishida

0 citations · 43 references · arXiv

α

Published on arXiv

2509.17413

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Proposed SDP-based framework provides tail-risk-aware safety certificates that systematically trade conservatism for tolerance to worst-case distributional tail events while preserving the computational tractability of prior QC/SDP verification methods.

WC-CVaR QC/SDP Verification

Novel technique introduced


Ensuring the safety of neural networks under input uncertainty is a fundamental challenge in safety-critical applications. This paper builds on and expands Fazlyab's quadratic-constraint (QC) and semidefinite-programming (SDP) framework for neural network verification to a distributionally robust and tail-risk-aware setting by integrating worst-case Conditional Value-at-Risk (WC-CVaR) over a moment-based ambiguity set with fixed mean and covariance. The resulting conditions remain SDP-checkable and explicitly account for tail risk. This integration broadens input-uncertainty geometry-covering ellipsoids, polytopes, and hyperplanes-and extends applicability to safety-critical domains where tail-event severity matters. Applications to closed-loop reachability of control systems and classification are demonstrated through numerical experiments, illustrating how the risk level $\varepsilon$ trades conservatism for tolerance to tail events-while preserving the computational structure of prior QC/SDP methods for neural network verification and robustness analysis.


Key Contributions

  • Extends Fazlyab's QC/SDP neural network verification framework to incorporate Worst-Case Conditional Value-at-Risk (WC-CVaR) over a moment-based distributional ambiguity set
  • Derives SDP-checkable safety conditions that explicitly account for tail risk, broadening input-uncertainty geometry to ellipsoids, polytopes, and hyperplanes
  • Establishes a formal equivalence between WC-CVaR and confidence-ellipsoid methods on special ellipsoidal sets, with numerical validation on closed-loop reachability and classification tasks

🛡️ Threat Analysis

Input Manipulation Attack

Proposes a certified robustness / safety verification method for neural networks under input perturbation — extending QC/SDP-based robustness analysis (a known ML01 defense technique) with distributionally robust WC-CVaR to provide formal safety certificates against worst-case tail-event inputs.


Details

Model Types
cnn
Threat Tags
white_boxinference_time
Applications
safety-critical control systemsimage classification