Influence-Guided Concolic Testing of Transformer Robustness
Chih-Duo Hong 1, Yu Wang 1, Yao-Chen Chang 2, Fang Yu 1
Published on arXiv
2509.23806
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Influence guidance finds 67 successful one-pixel and 6 two-pixel label-flip attacks on a single-layer Transformer, outperforming a FIFO baseline, while layer-prioritized scheduling maintains progress on a deeper two-layer model.
Influence-Guided Concolic Testing
Novel technique introduced
Concolic testing for deep neural networks alternates concrete execution with constraint solving to search for inputs that flip decisions. We present an {influence-guided} concolic tester for Transformer classifiers that ranks path predicates by SHAP-based estimates of their impact on the model output. To enable SMT solving on modern architectures, we prototype a solver-compatible, pure-Python semantics for multi-head self-attention and introduce practical scheduling heuristics that temper constraint growth on deeper models. In a white-box study on compact Transformers under small $L_0$ budgets, influence guidance finds label-flip inputs more efficiently than a FIFO baseline and maintains steady progress on deeper networks. Aggregating successful attack instances with a SHAP-based critical decision path analysis reveals recurring, compact decision logic shared across attacks. These observations suggest that (i) influence signals provide a useful search bias for symbolic exploration, and (ii) solver-friendly attention semantics paired with lightweight scheduling make concolic testing feasible for contemporary Transformer models, offering potential utility for debugging and model auditing.
Key Contributions
- SHAP-guided concolic testing that ranks path predicates by estimated influence on model output, focusing SMT solver effort on high-leverage branches in Transformer classifiers
- SMT-compatible pure-Python semantics for multi-head self-attention with softmax, enabling the first concolic testing approach that supports standard Transformer attention under constraint solving
- SHAP-based abstract critical decision path (ACDP) post-hoc analysis revealing that 245 of 4,430 neurons are critical for over half of all discovered adversarial inputs
🛡️ Threat Analysis
Primary contribution is generating adversarial examples (label-flip inputs) for Transformer classifiers under small L0 budgets via influence-guided symbolic constraint solving — a white-box evasion attack methodology. A framework for generating adversarial examples is an attack, not a benchmark.