attack 2025

Influence-Guided Concolic Testing of Transformer Robustness

Chih-Duo Hong 1, Yu Wang 1, Yao-Chen Chang 2, Fang Yu 1

0 citations · 57 references · arXiv

α

Published on arXiv

2509.23806

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Influence guidance finds 67 successful one-pixel and 6 two-pixel label-flip attacks on a single-layer Transformer, outperforming a FIFO baseline, while layer-prioritized scheduling maintains progress on a deeper two-layer model.

Influence-Guided Concolic Testing

Novel technique introduced


Concolic testing for deep neural networks alternates concrete execution with constraint solving to search for inputs that flip decisions. We present an {influence-guided} concolic tester for Transformer classifiers that ranks path predicates by SHAP-based estimates of their impact on the model output. To enable SMT solving on modern architectures, we prototype a solver-compatible, pure-Python semantics for multi-head self-attention and introduce practical scheduling heuristics that temper constraint growth on deeper models. In a white-box study on compact Transformers under small $L_0$ budgets, influence guidance finds label-flip inputs more efficiently than a FIFO baseline and maintains steady progress on deeper networks. Aggregating successful attack instances with a SHAP-based critical decision path analysis reveals recurring, compact decision logic shared across attacks. These observations suggest that (i) influence signals provide a useful search bias for symbolic exploration, and (ii) solver-friendly attention semantics paired with lightweight scheduling make concolic testing feasible for contemporary Transformer models, offering potential utility for debugging and model auditing.


Key Contributions

  • SHAP-guided concolic testing that ranks path predicates by estimated influence on model output, focusing SMT solver effort on high-leverage branches in Transformer classifiers
  • SMT-compatible pure-Python semantics for multi-head self-attention with softmax, enabling the first concolic testing approach that supports standard Transformer attention under constraint solving
  • SHAP-based abstract critical decision path (ACDP) post-hoc analysis revealing that 245 of 4,430 neurons are critical for over half of all discovered adversarial inputs

🛡️ Threat Analysis

Input Manipulation Attack

Primary contribution is generating adversarial examples (label-flip inputs) for Transformer classifiers under small L0 budgets via influence-guided symbolic constraint solving — a white-box evasion attack methodology. A framework for generating adversarial examples is an attack, not a benchmark.


Details

Domains
vision
Model Types
transformer
Threat Tags
white_boxinference_timetargeteddigital
Applications
image classification