defense 2025

Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks

Amirhossein Nazeri , Wael Hafez

0 citations

α

Published on arXiv

2508.21715

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Parallel entropy monitoring on VGG-16 achieves 90% adversarial detection accuracy by exploiting a consistent 7% activation entropy shift in early convolutional layers, with no model modification required.


Convolutional Neural Networks (CNNs) have become the foundation of modern computer vision, achieving unprecedented accuracy across diverse image recognition tasks. While these networks excel on in-distribution data, they remain vulnerable to adversarial perturbations imperceptible input modifications that cause misclassification with high confidence. However, existing detection methods either require expensive retraining, modify network architecture, or degrade performance on clean inputs. Here we show that adversarial perturbations create immediate, detectable entropy signatures in CNN activations that can be monitored without any model modification. Using parallel entropy monitoring on VGG-16, we demonstrate that adversarial inputs consistently shift activation entropy by 7% in early convolutional layers, enabling 90% detection accuracy with false positives and false negative rates below 20%. The complete separation between clean and adversarial entropy distributions reveals that CNNs inherently encode distribution shifts in their activation patterns. This work establishes that CNN reliability can be assessed through activation entropy alone, enabling practical deployment of self-diagnostic vision systems that detect adversarial inputs in real-time without compromising original model performance.


Key Contributions

  • Shows that adversarial perturbations produce detectable entropy shifts (~7%) in early CNN convolutional layer activations, enabling non-invasive detection
  • Proposes a parallel entropy monitoring framework that requires no model retraining, architecture changes, or degradation of clean-input performance
  • Demonstrates 90% adversarial detection accuracy with false positive and false negative rates below 20% on VGG-16

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial perturbation attacks (input manipulation at inference time) by detecting the entropy signatures they create in CNN activations — a non-invasive runtime defense against adversarial examples.


Details

Domains
vision
Model Types
cnn
Threat Tags
inference_timedigital
Applications
image classificationimage recognition