benchmark 2025

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Yudong Yang 1, Xuezhen Zhang 1, Zhifeng Han 1, Siyin Wang 1, Jimin Zhuang 1, Zengrui Jin 1, Jing Shao 2, Guangzhi Sun 3, Chao Zhang 1,2

3 citations · 39 references · arXiv

α

Published on arXiv

2511.10222

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Gemini 2.5 Pro with full safety guardrails exhibits 66% attack success rate under SACRED-Bench audio compositional attacks; SALMONN-Guard reduces this to 20%.

SACRED-Bench / SALMONN-Guard

Novel technique introduced


Recent progress in LLMs has enabled understanding of audio signals, but has also exposed new safety risks arising from complex audio inputs that are inadequately handled by current safeguards. We introduce SACRED-Bench (Speech-Audio Composition for RED-teaming) to evaluate the robustness of LLMs under complex audio-based attacks. Unlike existing perturbation-based methods that rely on noise optimization or white-box access, SACRED-Bench exploits speech-audio composition to enable effective black-box attacks. SACRED-Bench adopts three composition mechanisms: (a) overlap of harmful and benign speech, (b) mixture of benign speech with harmful non-speech audio, and (c) multi-speaker dialogue. These mechanisms focus on evaluating safety in settings where benign and harmful intents co-occur within a single auditory scene. Moreover, questions in SACRED-Bench are designed to implicitly refer to content in the audio, such that no explicit harmful information appears in the text prompt alone. Experiments demonstrate that even Gemini 2.5 Pro, a state-of-the-art proprietary LLM with safety guardrails fully enabled, still exhibits a 66% attack success rate. To bridge this gap, we propose SALMONN-Guard, the first guard model that jointly inspects speech, audio, and text for safety judgments, reducing the attack success rate to 20%. Our results highlight the need for audio-aware defenses to ensure the safety of multimodal LLMs. The dataset and SALMONN-Guard checkpoints can be found at https://huggingface.co/datasets/tsinghua-ee/SACRED-Bench.


Key Contributions

  • SACRED-Bench: a red-teaming benchmark with three compositional audio attack mechanisms (speech overlap, speech+non-speech mixture, multi-speaker dialogue) that achieve black-box jailbreaks without gradient access or explicit harmful text in the prompt
  • Demonstrates 66% attack success rate against Gemini 2.5 Pro with safety guardrails enabled, exposing a critical gap in audio-aware safeguards for state-of-the-art multimodal LLMs
  • SALMONN-Guard: the first guard model jointly inspecting speech, audio, and text modalities for safety judgments, reducing attack success rate to 20%

🛡️ Threat Analysis


Details

Domains
audiomultimodalnlp
Model Types
llmmultimodal
Threat Tags
black_boxinference_time
Datasets
SACRED-Bench
Applications
multimodal llmsaudio-capable llmsspeech understanding systems