attack 2026

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen 1,2,3, Kun Wang 1,2,4, Li Lu 1,2, Jiaheng Zhang 4, Tianwei Zhang 3

0 citations

α

Published on arXiv

2604.14604

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 79%-96% average hijacking success rates across 13 LALMs on unseen user contexts with high acoustic fidelity, and successfully manipulates commercial voice agents to execute unauthorized actions

AudioHijack

Novel technique introduced


Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel. While prior work studied audio jailbreaks, the security risks of malicious audio injection and downstream behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically analyze this threat, we propose \textit{AudioHijack}, a general framework that generates context-agnostic and imperceptible adversarial audio to hijack LALMs. \textit{AudioHijack} employs sampling-based gradient estimation for end-to-end optimization across diverse models, bypassing non-differentiable audio tokenization. Through attention supervision and multi-context training, it steers model attention toward adversarial audio and generalizes to unseen user contexts. We also design a convolutional blending method that modulates perturbations into natural reverberation, making them highly imperceptible to users. Extensive experiments on 13 state-of-the-art LALMs show consistent hijacking across 6 misbehavior categories, achieving average success rates of 79\%-96\% on unseen user contexts with high acoustic fidelity. Real-world studies demonstrate that commercial voice agents from Mistral AI and Microsoft Azure can be induced to execute unauthorized actions on behalf of users. These findings expose critical vulnerabilities in LALMs and highlight the urgent need for dedicated defense.


Key Contributions

  • AudioHijack framework for generating context-agnostic adversarial audio using sampling-based gradient estimation to bypass non-differentiable tokenization
  • Attention supervision and multi-context training to generalize attacks to unseen user contexts
  • Convolutional blending method that modulates perturbations into natural reverberation for imperceptibility
  • Demonstration of attacks on 13 LALMs achieving 79%-96% success rates and real-world exploitation of commercial voice agents

🛡️ Threat Analysis

Prompt Injection

The attack hijacks LALM behavior to execute unauthorized actions and inject malicious instructions - this is prompt injection via the audio channel rather than text, manipulating the model's downstream behavior.

Input Manipulation Attack

Crafts adversarial audio perturbations using gradient-based optimization to cause misclassification and behavior manipulation at inference time - classic adversarial example attack on the audio modality.


Details

Domains
audiomultimodalnlp
Model Types
llmmultimodaltransformer
Threat Tags
white_boxinference_timetargeteddigital
Datasets
LibriSpeechMistral AI voice agentMicrosoft Azure voice agent
Applications
voice assistantsaudio-language modelsmultimodal ai systems