attack 2026

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Bohan Wang 1, Zewen Liu 1, Lu Lin 2, Hui Liu 3, Li Xiong 1, Ming Jin 4, Wei Jin 1

0 citations · 69 references · arXiv (Cornell University)

α

Published on arXiv

2602.02763

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

TSEF achieves targeted misclassification while maintaining plausible, temporally consistent explanations across multiple datasets and explainer backbones, showing that explanation stability is a misleading robustness signal

TSEF (Time Series Explanation Fooler)

Novel technique introduced


Interpretable time series deep learning systems are often assessed by checking temporal consistency on explanations, implicitly treating this as evidence of robustness. We show that this assumption can fail: Predictions and explanations can be adversarially decoupled, enabling targeted misclassification while the explanation remains plausible and consistent with a chosen reference rationale. We propose TSEF (Time Series Explanation Fooler), a dual-target attack that jointly manipulates the classifier and explainer outputs. In contrast to single-objective misclassification attacks that disrupt explanation and spread attribution mass broadly, TSEF achieves targeted prediction changes while keeping explanations consistent with the reference. Across multiple datasets and explainer backbones, our results consistently reveal that explanation stability is a misleading proxy for decision robustness and motivate coupling-aware robustness evaluations for trustworthy time series tasks.


Key Contributions

  • TSEF dual-target adversarial attack that jointly optimizes for targeted misclassification and explanation fidelity to a reference rationale
  • Demonstrates that explanation temporal consistency is a misleading proxy for adversarial robustness in time series classifiers
  • Motivates coupling-aware robustness evaluation that jointly audits classifier and explainer outputs

🛡️ Threat Analysis

Input Manipulation Attack

TSEF crafts adversarial inputs at inference time that cause targeted misclassification — a direct input manipulation attack. The dual-target objective (simultaneously manipulating the explainer output to remain plausible) is a stealth mechanism, but the attack vector is adversarial perturbation of inputs, squarely within ML01.


Details

Domains
timeseries
Model Types
cnntransformer
Threat Tags
white_boxinference_timetargeteddigital
Applications
time series classificationinterpretable ml systems