ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs
Yibo Zhang 1, Liang Lin 2
1 Beijing University of Posts and Telecommunications
2 Institute of Information Engineering, Chinese Academy of Sciences
Published on arXiv
2509.11128
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
ERIS achieves an average Attack Success Rate of 95% across multiple mainstream ALMs, significantly outperforming existing audio-native and text jailbreak baselines while sounding like natural speech to human listeners and safety filters.
ERIS (Evolutionary Real-world Interference Scheme)
Novel technique introduced
The widespread application of Large Speech Models (LSMs) has made their security risks increasingly prominent. Traditional speech adversarial attack methods face challenges in balancing effectiveness and stealth. This paper proposes Evolutionary Noise Jailbreak (ENJ), which utilizes a genetic algorithm to transform environmental noise from a passive interference into an actively optimizable attack carrier for jailbreaking LSMs. Through operations such as population initialization, crossover fusion, and probabilistic mutation, this method iteratively evolves a series of audio samples that fuse malicious instructions with background noise. These samples sound like harmless noise to humans but can induce the model to parse and execute harmful commands. Extensive experiments on multiple mainstream speech models show that ENJ's attack effectiveness is significantly superior to existing baseline methods. This research reveals the dual role of noise in speech security and provides new critical insights for model security defense in complex acoustic environments.
Key Contributions
- ERIS framework that repurposes real-world environmental noise (traffic, rain, ambient chatter) as an optimizable attack carrier against ALM safety alignment
- Genetic algorithm-based optimization (population initialization, crossover fusion, probabilistic mutation) that evolves audio samples fusing malicious instructions with naturalistic background sounds
- Empirical demonstration of 95% average Attack Success Rate across multiple mainstream ALMs, significantly outperforming text and audio jailbreak baselines
🛡️ Threat Analysis
ERIS systematically optimizes audio inputs via genetic algorithm (population initialization, crossover, mutation) to craft adversarial audio samples that bypass safety filters — this is optimization-based input manipulation at inference time, analogous to how adversarial patches/perturbations attack vision models, applied here to audio.