attack 2025

Backdoor Attacks Against Speech Language Models

Alexandrine Fortier 1, Thomas Thebaud 2, Jesús Villalba 2, Najim Dehak 2, Patrick Cardinal 1

1 citations · 43 references · arXiv

α

Published on arXiv

2510.01157

Model Poisoning

OWASP ML Top 10 — ML10

Transfer Learning Attack

OWASP ML Top 10 — ML07

Key Finding

Backdoor attacks achieve 90.76%–99.41% attack success rate across four speech encoders (WavLM, HuBERT, wav2vec 2.0, Whisper) while maintaining clean-input accuracy.


Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to cascade domain-specific encoders with an LLM, making the resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks: automatic speech recognition (ASR), speech emotion recognition, and gender and age prediction. The attack consistently achieves high success rates, ranging from 90.76% to 99.41%. To better understand how backdoors propagate, we conduct a component-wise analysis to identify the most vulnerable stages of the pipeline. Finally, we propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.


Key Contributions

  • First systematic study of dirty-label backdoor attacks against a cascaded speech-language model (SpeechLLM), covering four tasks and three datasets
  • Component-level analysis isolating the contribution of the audio encoder, projection connector, and LoRA adapters to backdoor propagation
  • Fine-tuning-based post-training defense that mitigates the threat of poisoned pretrained speech encoders

🛡️ Threat Analysis

Transfer Learning Attack

Attacks specifically exploit the transfer learning pipeline: poisoned pretrained SSL encoders (WavLM, HuBERT, wav2vec 2.0, Whisper) propagate backdoors into the downstream SpeechLLM system; the proposed defense is fine-tuning to mitigate poisoned pretrained encoders, squarely a transfer learning threat.

Model Poisoning

Core contribution is backdoor/trojan injection into speech language models using an audio trigger (clicking noise), causing targeted misclassification on ASR, emotion, gender, and age tasks while behaving normally on clean inputs.


Details

Domains
audiomultimodalnlp
Model Types
llmtransformermultimodal
Threat Tags
training_timetargeteddigitalblack_box
Datasets
LibriSpeechVoxCeleb2-AECREMA-D
Applications
automatic speech recognitionspeech emotion recognitiongender/age predictionspeech language models