Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection
Meng Chen 1,2,3, Kun Wang 1,2,4, Li Lu 1,2, Jiaheng Zhang 4, Tianwei Zhang 3
2 Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
Published on arXiv
2604.14604
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves 79%-96% average hijacking success rates across 13 LALMs on unseen user contexts with high acoustic fidelity, and successfully manipulates commercial voice agents to execute unauthorized actions
AudioHijack
Novel technique introduced
Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel. While prior work studied audio jailbreaks, the security risks of malicious audio injection and downstream behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically analyze this threat, we propose \textit{AudioHijack}, a general framework that generates context-agnostic and imperceptible adversarial audio to hijack LALMs. \textit{AudioHijack} employs sampling-based gradient estimation for end-to-end optimization across diverse models, bypassing non-differentiable audio tokenization. Through attention supervision and multi-context training, it steers model attention toward adversarial audio and generalizes to unseen user contexts. We also design a convolutional blending method that modulates perturbations into natural reverberation, making them highly imperceptible to users. Extensive experiments on 13 state-of-the-art LALMs show consistent hijacking across 6 misbehavior categories, achieving average success rates of 79\%-96\% on unseen user contexts with high acoustic fidelity. Real-world studies demonstrate that commercial voice agents from Mistral AI and Microsoft Azure can be induced to execute unauthorized actions on behalf of users. These findings expose critical vulnerabilities in LALMs and highlight the urgent need for dedicated defense.
Key Contributions
- AudioHijack framework for generating context-agnostic adversarial audio using sampling-based gradient estimation to bypass non-differentiable tokenization
- Attention supervision and multi-context training to generalize attacks to unseen user contexts
- Convolutional blending method that modulates perturbations into natural reverberation for imperceptibility
- Demonstration of attacks on 13 LALMs achieving 79%-96% success rates and real-world exploitation of commercial voice agents
🛡️ Threat Analysis
The attack hijacks LALM behavior to execute unauthorized actions and inject malicious instructions - this is prompt injection via the audio channel rather than text, manipulating the model's downstream behavior.
Crafts adversarial audio perturbations using gradient-based optimization to cause misclassification and behavior manipulation at inference time - classic adversarial example attack on the audio modality.