attack 2025

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Weiwei Qi ¹, Shuo Shao ¹, Wei Gu ¹, Tianhang Zheng ^1,2, Puning Zhao ³, Zhan Qin ^1,2, Kui Ren ^1,2

¹ Zhejiang University

² Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

³ Sun Yat-sen University

0 citations

Published on arXiv

2508.13048

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves over 90% attack success rate on GPT-4o and Gemini-2.0-flash with fewer than 15 queries per attempt, outperforming existing black-box jailbreak methods.

MAJIC

Novel technique introduced

Large Language Models (LLMs) have exhibited remarkable capabilities but remain vulnerable to jailbreaking attacks, which can elicit harmful content from the models by manipulating the input prompts. Existing black-box jailbreaking techniques primarily rely on static prompts crafted with a single, non-adaptive strategy, or employ rigid combinations of several underperforming attack methods, which limits their adaptability and generalization. To address these limitations, we propose MAJIC, a Markovian adaptive jailbreaking framework that attacks black-box LLMs by iteratively combining diverse innovative disguise strategies. MAJIC first establishes a ``Disguise Strategy Pool'' by refining existing strategies and introducing several innovative approaches. To further improve the attack performance and efficiency, MAJIC formulate the sequential selection and fusion of strategies in the pool as a Markov chain. Under this formulation, MAJIC initializes and employs a Markov matrix to guide the strategy composition, where transition probabilities between strategies are dynamically adapted based on attack outcomes, thereby enabling MAJIC to learn and discover effective attack pathways tailored to the target model. Our empirical results demonstrate that MAJIC significantly outperforms existing jailbreak methods on prominent models such as GPT-4o and Gemini-2.0-flash, achieving over 90\% attack success rate with fewer than 15 queries per attempt on average.

Key Contributions

Disguise Strategy Pool with novel natural-language attack strategies including contextual assumption, linguistic obfuscation, role-playing framing, semantic inversion, and literary disguise.
Markov chain formulation for sequential selection and fusion of strategies, initialized via a proxy LLM and local datasets.
Q-learning-inspired dynamic update mechanism for the Markov transition matrix, enabling the attack to adapt to target model responses in real time.

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

AdvBench

Applications

llm safety alignmentchatbots

Read PDF arXiv

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Malicious Repurposing of Open Science Artefacts by Using Large Language Models

A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

PINA: Prompt Injection Attack against Navigation Agents

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Semantic Representation Attack against Aligned Large Language Models

EquaCode: A Multi-Strategy Jailbreak Approach for Large Language Models via Equation Solving and Code Completion

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming