Influence-Guided Concolic Testing of Transformer Robustness

Concolic testing for deep neural networks alternates concrete execution with constraint solving to search for inputs that flip decisions. We present an {influence-guided} concolic tester for Transformer classifiers that ranks path predicates by SHAP-based estimates of their impact on the model output. To enable SMT solving on modern architectures, we prototype a solver-compatible, pure-Python semantics for multi-head self-attention and introduce practical scheduling heuristics that temper constraint growth on deeper models. In a white-box study on compact Transformers under small $L_0$ budgets, influence guidance finds label-flip inputs more efficiently than a FIFO baseline and maintains steady progress on deeper networks. Aggregating successful attack instances with a SHAP-based critical decision path analysis reveals recurring, compact decision logic shared across attacks. These observations suggest that (i) influence signals provide a useful search bias for symbolic exploration, and (ii) solver-friendly attention semantics paired with lightweight scheduling make concolic testing feasible for contemporary Transformer models, offering potential utility for debugging and model auditing.

Key Contributions

SHAP-guided concolic testing that ranks path predicates by estimated influence on model output, focusing SMT solver effort on high-leverage branches in Transformer classifiers
SMT-compatible pure-Python semantics for multi-head self-attention with softmax, enabling the first concolic testing approach that supports standard Transformer attention under constraint solving
SHAP-based abstract critical decision path (ACDP) post-hoc analysis revealing that 245 of 4,430 neurons are critical for over half of all discovered adversarial inputs

🛡️ Threat Analysis

Input Manipulation Attack

Primary contribution is generating adversarial examples (label-flip inputs) for Transformer classifiers under small L0 budgets via influence-guided symbolic constraint solving — a white-box evasion attack methodology. A framework for generating adversarial examples is an attack, not a benchmark.

Details

Domains

vision

Model Types

transformer

Threat Tags

white_boxinference_timetargeteddigital

Applications

2025 0 cit.

Input Manipulation Attack

83%