defense 2026

NERO-Net: A Neuroevolutionary Approach for the Design of Adversarially Robust CNNs

Inês Valentim , Nuno Antunes , Nuno Lourenço

0 citations

α

Published on arXiv

2603.25517

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Evolved architecture achieves 47% adversarial accuracy (FGSM) and 93% clean accuracy with standard training, 40% robust accuracy against AutoAttack with adversarial training, demonstrating intrinsic robustness

NERO-Net

Novel technique introduced


Neuroevolution automates the complex task of neural network design but often ignores the inherent adversarial fragility of evolved models which is a barrier to adoption in safety-critical scenarios. While robust training methods have received significant attention, the design of architectures exhibiting intrinsic robustness remains largely unexplored. In this paper, we propose NERO-Net, a neuroevolutionary approach to design convolutional neural networks better equipped to resist adversarial attacks. Our search strategy isolates architectural influence on robustness by avoiding adversarial training during the evolutionary loop. As such, our fitness function promotes candidates that, even trained with standard (non-robust) methods, achieve high post-attack accuracy without sacrificing the accuracy on clean samples. We assess NERO-Net on CIFAR-10 with a specific focus on $L_\infty$-robustness. In particular, the fittest individual emerged from evolutionary search with 33% accuracy against FGSM, used as an efficient estimator for robustness during the search phase, while maintaining 87% clean accuracy. Further standard training of this individual boosted these metrics to 47% adversarial and 93% clean accuracy, suggesting inherent architectural robustness. Adversarial training brings the overall accuracy of the model up to 40% against AutoAttack.


Key Contributions

  • Neuroevolutionary framework (NERO-Net) that co-optimizes clean accuracy and adversarial robustness through fitness function without adversarial training during search
  • Flexible genotypic representation supporting diverse topological connectivity patterns beyond cell-based search spaces
  • Demonstrates intrinsic architectural robustness: evolved architecture achieves 47% adversarial accuracy (FGSM) with standard training, 40% against AutoAttack with adversarial training

🛡️ Threat Analysis

Input Manipulation Attack

Paper defends against adversarial examples (FGSM, PGD, AutoAttack) at inference time by designing architectures with intrinsic robustness to L_infinity and L_2 perturbations.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_timeuntargeteddigital
Datasets
CIFAR-10
Applications
image classification