defense 2026

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Firas Ben Hmida , Philemon Hailemariam , Kashif Ali Khan , Birhanu Eshete

0 citations

α

Published on arXiv

2604.14457

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Inference provenance-based detection achieves consistently high detection performance with strong transferability across attack types, outperforming prior graph-based baselines

NeuroTrace

Novel technique introduced


Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such as adversarial examples. Existing detection methods predominantly rely on layer-local signals (e.g., activations or attribution scores), leaving cross-layer information flow and execution structure under-explored. We introduce NeuroTrace, a framework and open dataset for analyzing inference provenance through Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that capture both activation behavior and parameter-induced dataflow during a model's forward pass, providing a structured representation of how information propagates through the network. NeuroTrace includes (i) a reproducible extraction engine that instruments model execution, (ii) a standardized graph representation compatible with heterogeneous GNNs, and (iii) a benchmark suite spanning multiple adversarial attack families across vision and malware domains. Using this framework, we evaluate IPG-based detectors for adversarial example detection under intra-attack, multi-attack, and cross-threat transfer settings. Our results show that inference provenance provides a strong and transferable signal for distinguishing adversarial and benign inputs, achieving consistently high detection performance and improving over prior graph-based baselines. We further analyze the conditions under which provenance-based detection generalizes across attack types, as well as the associated runtime and storage trade-offs. By releasing the dataset, extraction pipeline, and evaluation protocol, NeuroTrace enables systematic study of inference-time behavior and establishes inference provenance as a practical foundation for building more transparent and auditable machine learning systems.


Key Contributions

  • Introduces Inference Provenance Graphs (IPGs) that capture cross-layer information flow and parameter-induced dataflow during model execution
  • Develops reproducible extraction engine and standardized graph representation compatible with heterogeneous GNNs
  • Provides open benchmark suite spanning multiple adversarial attack families across vision and malware domains with evaluation under intra-attack, multi-attack, and cross-threat transfer settings

🛡️ Threat Analysis

Input Manipulation Attack

The paper addresses detection of adversarial examples (input manipulation attacks) at inference time. The defense analyzes inference provenance to distinguish adversarial inputs from benign ones across multiple attack families including gradient-based attacks in vision and malware domains.


Details

Domains
vision
Model Types
cnn
Threat Tags
inference_timedigital
Applications
image classificationmalware detection