Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates
Samrendra Roy 1, Kazuma Kobayashi 1, Souvik Chakraborty 2, Rizwan-uddin 1, Syed Bahauddin Alam 1,3
Published on arXiv
2603.22525
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Sparse adversarial perturbations (< 1% of inputs) increase relative L2 error from ~1.5% to 37-63% across four operator architectures, with 100% of single-point attacks passing z-score anomaly detection
Gradient-free differential evolution attack on neural operators
Novel technique introduced
Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.
Key Contributions
- First demonstration that neural operator models are vulnerable to extremely sparse adversarial perturbations (< 1% of inputs) that cause catastrophic prediction failures while passing standard validation
- Introduction of effective perturbation dimension (d_eff) as a Jacobian-based diagnostic that predicts vulnerability across operator architectures
- Demonstration that gradient-free differential evolution outperforms gradient-based attacks (PGD) on architectures with gradient pathologies, while random perturbations achieve near-zero success rates
🛡️ Threat Analysis
Paper demonstrates adversarial perturbation attacks on neural operator models at inference time, crafting minimal input perturbations (sparse boundary condition modifications) that cause catastrophic prediction failures (L2 error increasing from 1.5% to 37-63%). Uses both gradient-free differential evolution and gradient-based PGD attacks to manipulate inputs and cause model failures. This is a clear input manipulation attack on ML models during inference.