attack 2026

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Junyu Ren ¹, Xingjian Pan ¹, Wensheng Gan ¹, Philip S. Yu ²

¹ Jinan University

² University of Illinois Chicago

0 citations

Published on arXiv

2604.12548

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Dual-space mutation achieves 0.189 mean MSR and 0.375 peak MSR on DeepSeek, improving mean MSR by 12.5% over semantic-only and 5.6% over character-only mutations while maintaining strong imperceptibility

PromptFuzz-SC

Novel technique introduced

Prompt injection has emerged as a critical security threat to large language models (LLMs), yet existing studies predominantly focus on single-dimensional attack strategies, such as semantic rewriting or character-level obfuscation, which fail to capture the combined effects of multi-space perturbations in realistic scenarios. In addition, systematic black-box robustness evaluations of recent Chinese LLMs, such as DeepSeek, remain limited. To address these gaps, we propose PromptFuzz-SC, a semantic-character dual-space mutation framework for evaluating LLM robustness against prompt injection. The framework integrates semantic transformations (e.g., paraphrasing and word-order perturbation) with character-level obfuscation (e.g., zero-width insertion and encoding-based mutation), forming a unified and extensible mutation operator library. A hybrid search strategy combining epsilon-greedy exploration and hill-climbing refinement is adopted to efficiently discover high-quality adversarial prompts. We further introduce a unified evaluation protocol based on three metrics: misuse success rate (MSR), Average Queries to Success (AQS), and Stealth. Experimental results on DeepSeek demonstrate that dual-space mutation achieves the strongest overall attack performance among the evaluated strategies, attaining the highest mean MSR (0.189), peak MSR (0.375), and mean Stealth. Compared with semantic-only and character-only mutation, it improves mean MSR by 12.5% and 5.6%, respectively. While not consistently minimizing query cost, the proposed method achieves competitive best-case efficiency and maintains strong imperceptibility, indicating a more favorable balance between attack effectiveness and concealment. These findings highlight the importance of composite mutation strategies for robust red-teaming of LLMs and provide practical insights for the design of multi-layer defense mechanisms.

Key Contributions

PromptFuzz-SC framework integrating semantic transformations (paraphrasing, word-order perturbation) with character-level obfuscation (zero-width insertion, encoding mutation)
Hybrid search strategy combining epsilon-greedy exploration and hill-climbing for efficient adversarial prompt discovery
Unified evaluation protocol with three metrics: Misuse Success Rate (MSR), Average Queries to Success (AQS), and Stealth

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

conversational aicontent moderationchatbot safety

Read PDF arXiv Code

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode

Semantic Representation Attack against Aligned Large Language Models

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Adaptive Instruction Composition for Automated LLM Red-Teaming

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search