NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such as adversarial examples. Existing detection methods predominantly rely on layer-local signals (e.g., activations or attribution scores), leaving cross-layer information flow and execution structure under-explored. We introduce NeuroTrace, a framework and open dataset for analyzing inference provenance through Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that capture both activation behavior and parameter-induced dataflow during a model's forward pass, providing a structured representation of how information propagates through the network. NeuroTrace includes (i) a reproducible extraction engine that instruments model execution, (ii) a standardized graph representation compatible with heterogeneous GNNs, and (iii) a benchmark suite spanning multiple adversarial attack families across vision and malware domains. Using this framework, we evaluate IPG-based detectors for adversarial example detection under intra-attack, multi-attack, and cross-threat transfer settings. Our results show that inference provenance provides a strong and transferable signal for distinguishing adversarial and benign inputs, achieving consistently high detection performance and improving over prior graph-based baselines. We further analyze the conditions under which provenance-based detection generalizes across attack types, as well as the associated runtime and storage trade-offs. By releasing the dataset, extraction pipeline, and evaluation protocol, NeuroTrace enables systematic study of inference-time behavior and establishes inference provenance as a practical foundation for building more transparent and auditable machine learning systems.

Key Contributions

Introduces Inference Provenance Graphs (IPGs) that capture cross-layer information flow and parameter-induced dataflow during model execution
Develops reproducible extraction engine and standardized graph representation compatible with heterogeneous GNNs
Provides open benchmark suite spanning multiple adversarial attack families across vision and malware domains with evaluation under intra-attack, multi-attack, and cross-threat transfer settings

🛡️ Threat Analysis

Input Manipulation Attack

The paper addresses detection of adversarial examples (input manipulation attacks) at inference time. The defense analyzes inference provenance to distinguish adversarial inputs from benign ones across multiple attack families including gradient-based attacks in vision and malware domains.

Details

Domains

vision

Model Types

cnn

Threat Tags

inference_timedigital

Applications

2026 0 cit.

Input Manipulation Attack

100%