attack 2025

ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs

Yibo Zhang 1, Liang Lin 2

0 citations

α

Published on arXiv

2509.11128

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

ERIS achieves an average Attack Success Rate of 95% across multiple mainstream ALMs, significantly outperforming existing audio-native and text jailbreak baselines while sounding like natural speech to human listeners and safety filters.

ERIS (Evolutionary Real-world Interference Scheme)

Novel technique introduced


The widespread application of Large Speech Models (LSMs) has made their security risks increasingly prominent. Traditional speech adversarial attack methods face challenges in balancing effectiveness and stealth. This paper proposes Evolutionary Noise Jailbreak (ENJ), which utilizes a genetic algorithm to transform environmental noise from a passive interference into an actively optimizable attack carrier for jailbreaking LSMs. Through operations such as population initialization, crossover fusion, and probabilistic mutation, this method iteratively evolves a series of audio samples that fuse malicious instructions with background noise. These samples sound like harmless noise to humans but can induce the model to parse and execute harmful commands. Extensive experiments on multiple mainstream speech models show that ENJ's attack effectiveness is significantly superior to existing baseline methods. This research reveals the dual role of noise in speech security and provides new critical insights for model security defense in complex acoustic environments.


Key Contributions

  • ERIS framework that repurposes real-world environmental noise (traffic, rain, ambient chatter) as an optimizable attack carrier against ALM safety alignment
  • Genetic algorithm-based optimization (population initialization, crossover fusion, probabilistic mutation) that evolves audio samples fusing malicious instructions with naturalistic background sounds
  • Empirical demonstration of 95% average Attack Success Rate across multiple mainstream ALMs, significantly outperforming text and audio jailbreak baselines

🛡️ Threat Analysis

Input Manipulation Attack

ERIS systematically optimizes audio inputs via genetic algorithm (population initialization, crossover, mutation) to craft adversarial audio samples that bypass safety filters — this is optimization-based input manipulation at inference time, analogous to how adversarial patches/perturbations attack vision models, applied here to audio.


Details

Domains
audionlp
Model Types
llmmultimodal
Threat Tags
black_boxinference_timetargeteddigital
Datasets
WHAM noise corpus
Applications
audio large modelsvoice assistantsspeech-based ai systems