MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies
Weiwei Qi 1, Shuo Shao 1, Wei Gu 1, Tianhang Zheng 1,2, Puning Zhao 3, Zhan Qin 1,2, Kui Ren 1,2
Published on arXiv
2508.13048
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves over 90% attack success rate on GPT-4o and Gemini-2.0-flash with fewer than 15 queries per attempt, outperforming existing black-box jailbreak methods.
MAJIC
Novel technique introduced
Large Language Models (LLMs) have exhibited remarkable capabilities but remain vulnerable to jailbreaking attacks, which can elicit harmful content from the models by manipulating the input prompts. Existing black-box jailbreaking techniques primarily rely on static prompts crafted with a single, non-adaptive strategy, or employ rigid combinations of several underperforming attack methods, which limits their adaptability and generalization. To address these limitations, we propose MAJIC, a Markovian adaptive jailbreaking framework that attacks black-box LLMs by iteratively combining diverse innovative disguise strategies. MAJIC first establishes a ``Disguise Strategy Pool'' by refining existing strategies and introducing several innovative approaches. To further improve the attack performance and efficiency, MAJIC formulate the sequential selection and fusion of strategies in the pool as a Markov chain. Under this formulation, MAJIC initializes and employs a Markov matrix to guide the strategy composition, where transition probabilities between strategies are dynamically adapted based on attack outcomes, thereby enabling MAJIC to learn and discover effective attack pathways tailored to the target model. Our empirical results demonstrate that MAJIC significantly outperforms existing jailbreak methods on prominent models such as GPT-4o and Gemini-2.0-flash, achieving over 90\% attack success rate with fewer than 15 queries per attempt on average.
Key Contributions
- Disguise Strategy Pool with novel natural-language attack strategies including contextual assumption, linguistic obfuscation, role-playing framing, semantic inversion, and literary disguise.
- Markov chain formulation for sequential selection and fusion of strategies, initialized via a proxy LLM and local datasets.
- Q-learning-inspired dynamic update mechanism for the Markov transition matrix, enabling the attack to adapt to target model responses in real time.