attack 2025

Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks

Aravindhan G , Yuvaraj Govindarajulu , Parin Shah

0 citations · 35 references · arXiv

α

Published on arXiv

2509.22060

Input Manipulation Attack

OWASP ML Top 10 — ML01

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Hybrid model generates imperceptible adversarial audio examples with SNR of 35dB that can be produced within one minute, causing ASR misinterpretation.

FGSM + ZOO hybrid adversarial attack

Novel technique introduced


Recent studies have demonstrated the vulnerability of Automatic Speech Recognition systems to adversarial examples, which can deceive these systems into misinterpreting input speech commands. While previous research has primarily focused on white-box attacks with constrained optimizations, and transferability based black-box attacks against commercial Automatic Speech Recognition devices, this paper explores cost efficient white-box attack and non transferability black-box adversarial attacks on Automatic Speech Recognition systems, drawing insights from approaches such as Fast Gradient Sign Method and Zeroth-Order Optimization. Further, the novelty of the paper includes how poisoning attack can degrade the performances of state-of-the-art models leading to misinterpretation of audio signals. Through experimentation and analysis, we illustrate how hybrid models can generate subtle yet impactful adversarial examples with very little perturbation having Signal Noise Ratio of 35dB that can be generated within a minute. These vulnerabilities of state-of-the-art open source model have practical security implications, and emphasize the need for adversarial security.


Key Contributions

  • Cost-efficient white-box adversarial attacks on ASR using FGSM adapted for audio inputs
  • Non-transferability black-box attacks using Zeroth-Order Optimization against ASR systems
  • Poisoning attack methodology that degrades state-of-the-art ASR model performance, generating adversarial examples at 35dB SNR within one minute

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution includes crafting adversarial audio examples using FGSM (white-box) and Zeroth-Order Optimization (black-box) to cause ASR misinterpretation at inference time — classic input manipulation/evasion attacks.

Data Poisoning Attack

Paper explicitly studies poisoning attacks that degrade state-of-the-art ASR model performance, corrupting training data to cause misinterpretation of audio signals.


Details

Domains
audio
Model Types
transformerrnncnn
Threat Tags
white_boxblack_boxinference_timetraining_timeuntargeteddigital
Applications
automatic speech recognitionvoice command systems