benchmark 2025

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Yudong Yang ¹, Xuezhen Zhang ¹, Zhifeng Han ¹, Siyin Wang ¹, Jimin Zhuang ¹, Zengrui Jin ¹, Jing Shao ², Guangzhi Sun ³, Chao Zhang ^1,2

¹ Tsinghua University

² Shanghai Artificial Intelligence Laboratory

³ University of Cambridge

3 citations · 39 references · arXiv

Published on arXiv

2511.10222

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Gemini 2.5 Pro with full safety guardrails exhibits 66% attack success rate under SACRED-Bench audio compositional attacks; SALMONN-Guard reduces this to 20%.

SACRED-Bench / SALMONN-Guard

Novel technique introduced

Recent progress in LLMs has enabled understanding of audio signals, but has also exposed new safety risks arising from complex audio inputs that are inadequately handled by current safeguards. We introduce SACRED-Bench (Speech-Audio Composition for RED-teaming) to evaluate the robustness of LLMs under complex audio-based attacks. Unlike existing perturbation-based methods that rely on noise optimization or white-box access, SACRED-Bench exploits speech-audio composition to enable effective black-box attacks. SACRED-Bench adopts three composition mechanisms: (a) overlap of harmful and benign speech, (b) mixture of benign speech with harmful non-speech audio, and (c) multi-speaker dialogue. These mechanisms focus on evaluating safety in settings where benign and harmful intents co-occur within a single auditory scene. Moreover, questions in SACRED-Bench are designed to implicitly refer to content in the audio, such that no explicit harmful information appears in the text prompt alone. Experiments demonstrate that even Gemini 2.5 Pro, a state-of-the-art proprietary LLM with safety guardrails fully enabled, still exhibits a 66% attack success rate. To bridge this gap, we propose SALMONN-Guard, the first guard model that jointly inspects speech, audio, and text for safety judgments, reducing the attack success rate to 20%. Our results highlight the need for audio-aware defenses to ensure the safety of multimodal LLMs. The dataset and SALMONN-Guard checkpoints can be found at https://huggingface.co/datasets/tsinghua-ee/SACRED-Bench.

Key Contributions

SACRED-Bench: a red-teaming benchmark with three compositional audio attack mechanisms (speech overlap, speech+non-speech mixture, multi-speaker dialogue) that achieve black-box jailbreaks without gradient access or explicit harmful text in the prompt
Demonstrates 66% attack success rate against Gemini 2.5 Pro with safety guardrails enabled, exposing a critical gap in audio-aware safeguards for state-of-the-art multimodal LLMs
SALMONN-Guard: the first guard model jointly inspecting speech, audio, and text modalities for safety judgments, reducing attack success rate to 20%

🛡️ Threat Analysis

Details

Domains

audiomultimodalnlp

Model Types

llmmultimodal

Threat Tags

black_boxinference_time

Datasets

SACRED-Bench

Applications

multimodal llmsaudio-capable llmsspeech understanding systems

Read PDF arXiv DOI Code

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Benchmarking Gaslighting Attacks Against Speech Large Language Models

Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations

Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering