attack 2025

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers

Liang Lin ¹, Miao Yu ², Kaiwen Luo ³, Yibo Zhang ⁴, Lilan Peng ⁵, Dexian Wang ⁶, Xuehai Tang ¹, Yuanhe Zhang ⁴, Xikang Yang ¹, Zhenhong Zhou ³, Kun Wang ³, Yang Liu ³

¹ Chinese Academy of Sciences

² University of Science and Technology of China

³ Nanyang Technological University

⁴ Beijing University of Posts and Telecommunications

⁵ Southwest Jiaotong University

⁶ Chengdu University of Traditional Chinese Medicine

0 citations

Published on arXiv

2508.02175

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Emotion-based and speed-based acoustic triggers achieve over 95% attack success rate on ALLMs at 3% poisoning ratio, while noise-based triggers average 88.7% ASR across all tested models.

Hidden in the Noise (HIN)

Novel technique introduced

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio's distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM's acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate. (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack's stealth.

Key Contributions

HIN attack framework that exploits audio-specific acoustic features (temporal dynamics, spectral noise injection, environmental sound fusion, speaker characteristics) as stealthy backdoor triggers in Audio LLMs
AudioSafe benchmark with nine distinct risk categories for standardized evaluation of ALLM robustness against audio-feature-based backdoor attacks
Empirical demonstration that acoustic triggers achieve >90% average ASR at poisoning ratios as low as 3%, with poisoned samples causing only marginal loss curve fluctuations, evading detection

🛡️ Threat Analysis

Model Poisoning

HIN embeds hidden backdoor triggers into ALLMs via acoustic modifications (temporal dynamics, spectral noise, environmental sounds, speaker characteristics) that activate targeted harmful outputs only when the specific acoustic trigger is present at inference time, while the model behaves normally on benign inputs — a textbook backdoor/trojan attack.

Details

Domains

audionlp

Model Types

llmmultimodal

Threat Tags

training_timeinference_timetargeted

Datasets

AudioSafeestablished safety datasets (3 unnamed)

Applications

audio language modelsspeech processingautomatic speech recognitionaudio safety alignment

Read PDF arXiv Code

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning

SASER: Stego attacks on open-source LLMs

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

TFL: Targeted Bit-Flip Attack on Large Language Model

Ghosting Your LLM: Without The Knowledge of Your Gradient and Data

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Localizing Malicious Outputs from CodeLLM

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs