attack 2026

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

0 citations

Published on arXiv

2604.14604

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 79%-96% average hijacking success rates across 13 LALMs on unseen user contexts with high acoustic fidelity, and successfully manipulates commercial voice agents to execute unauthorized actions

AudioHijack

Novel technique introduced

Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel. While prior work studied audio jailbreaks, the security risks of malicious audio injection and downstream behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically analyze this threat, we propose \textit{AudioHijack}, a general framework that generates context-agnostic and imperceptible adversarial audio to hijack LALMs. \textit{AudioHijack} employs sampling-based gradient estimation for end-to-end optimization across diverse models, bypassing non-differentiable audio tokenization. Through attention supervision and multi-context training, it steers model attention toward adversarial audio and generalizes to unseen user contexts. We also design a convolutional blending method that modulates perturbations into natural reverberation, making them highly imperceptible to users. Extensive experiments on 13 state-of-the-art LALMs show consistent hijacking across 6 misbehavior categories, achieving average success rates of 79\%-96\% on unseen user contexts with high acoustic fidelity. Real-world studies demonstrate that commercial voice agents from Mistral AI and Microsoft Azure can be induced to execute unauthorized actions on behalf of users. These findings expose critical vulnerabilities in LALMs and highlight the urgent need for dedicated defense.

Key Contributions

AudioHijack framework for generating context-agnostic adversarial audio using sampling-based gradient estimation to bypass non-differentiable tokenization
Attention supervision and multi-context training to generalize attacks to unseen user contexts
Convolutional blending method that modulates perturbations into natural reverberation for imperceptibility
Demonstration of attacks on 13 LALMs achieving 79%-96% success rates and real-world exploitation of commercial voice agents

🛡️ Threat Analysis

Prompt Injection

The attack hijacks LALM behavior to execute unauthorized actions and inject malicious instructions - this is prompt injection via the audio channel rather than text, manipulating the model's downstream behavior.

Input Manipulation Attack

Crafts adversarial audio perturbations using gradient-based optimization to cause misclassification and behavior manipulation at inference time - classic adversarial example attack on the audio modality.

Details

Domains

audiomultimodalnlp

Model Types

llmmultimodaltransformer

Threat Tags

white_boxinference_timetargeteddigital

Datasets

LibriSpeechMistral AI voice agentMicrosoft Azure voice agent

Applications

voice assistantsaudio-language modelsmultimodal ai systems

Read PDF arXiv

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

On Optimizing Multimodal Jailbreaks for Spoken Language Models

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

SoundBreak: A Systematic Study of Audio-Only Adversarial Attacks on Trimodal Models