attack 2025

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

2 citations · 71 references · Journal of Network and Compute...

Published on arXiv

2510.10281

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

One-shot ASCII art jailbreak successfully bypasses safety alignment on GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, evading LLaMA Guard and Azure content filters with a reconnaissance-guided methodology

ArtPerception

Novel technique introduced

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.

Key Contributions

ArtPerception: a two-phase jailbreak framework using ASCII art to bypass LLM safety alignment — Phase 1 conducts a model-specific pre-test to find optimal ASCII art recognition parameters, Phase 2 launches a one-shot jailbreak attack
Modified Levenshtein Distance (MLD) metric for quantitatively evaluating LLM ASCII art recognition capability
Demonstrated transferability to commercial LLMs (GPT-4o, Claude Sonnet 3.7, DeepSeek-V3) and evasion of defenses including LLaMA Guard and Azure content filters

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

llm chatbotsgeneral-purpose language modelscontent moderation systems

Read PDF arXiv DOI

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space

PUZZLED: Jailbreaking LLMs through Word-Based Puzzles

Anecdoctoring: Automated Red-Teaming Across Language and Place

Boundary Point Jailbreaking of Black-Box LLMs

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation