Backdoor Attacks Against Speech Language Models

Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to cascade domain-specific encoders with an LLM, making the resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks: automatic speech recognition (ASR), speech emotion recognition, and gender and age prediction. The attack consistently achieves high success rates, ranging from 90.76% to 99.41%. To better understand how backdoors propagate, we conduct a component-wise analysis to identify the most vulnerable stages of the pipeline. Finally, we propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.

Key Contributions

First systematic study of dirty-label backdoor attacks against a cascaded speech-language model (SpeechLLM), covering four tasks and three datasets
Component-level analysis isolating the contribution of the audio encoder, projection connector, and LoRA adapters to backdoor propagation
Fine-tuning-based post-training defense that mitigates the threat of poisoned pretrained speech encoders

🛡️ Threat Analysis

Transfer Learning Attack

Attacks specifically exploit the transfer learning pipeline: poisoned pretrained SSL encoders (WavLM, HuBERT, wav2vec 2.0, Whisper) propagate backdoors into the downstream SpeechLLM system; the proposed defense is fine-tuning to mitigate poisoned pretrained encoders, squarely a transfer learning threat.

Model Poisoning

Core contribution is backdoor/trojan injection into speech language models using an audio trigger (clicking noise), causing targeted misclassification on ASR, emotion, gender, and age tasks while behaving normally on clean inputs.