attack 2026

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

0 citations

Published on arXiv

2602.22983

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

CC-BOS achieves nearly 100% attack success rate across six representative LLMs, consistently outperforming state-of-the-art jailbreak methods

CC-BOS

Novel technique introduced

As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this paper proposes a framework, CC-BOS, for the automatic generation of classical Chinese adversarial prompts based on multi-dimensional fruit fly optimization, facilitating efficient and automated jailbreak attacks in black-box settings. Prompts are encoded into eight policy dimensions-covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context; and iteratively refined via smell search, visual search, and cauchy mutation. This design enables efficient exploration of the search space, thereby enhancing the effectiveness of black-box jailbreak attacks. To enhance readability and evaluation accuracy, we further design a classical Chinese to English translation module. Extensive experiments demonstrate that effectiveness of the proposed CC-BOS, consistently outperforming state-of-the-art jailbreak attack methods.

Key Contributions

Introduces classical Chinese as a novel adversarial context for LLM jailbreaks, identifying a safety blind spot where guardrails trained on modern languages fail to detect harmful intent
Proposes CC-BOS, a bio-inspired (fruit fly optimization) framework that encodes jailbreak prompts as an 8-dimensional strategy space and iteratively refines them via smell search, visual search, and Cauchy mutation
Designs a two-stage classical Chinese-to-English translation module to ensure reliable evaluation of model responses in cross-lingual jailbreak scenarios

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

AdvBench

Applications

large language model safety alignmentllm jailbreaking

Read PDF arXiv

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack

PINA: Prompt Injection Attack against Navigation Agents

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks