attack 2025

Special-Character Adversarial Attacks on Open-Source Language Model

Ephraiem Sarabamoun

0 citations

α

Published on arXiv

2508.14070

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

All seven evaluated open-source LLMs (3.8B–32B parameters) exhibit critical vulnerabilities to character-level attacks, producing successful jailbreaks, incoherent outputs, and unrelated hallucinations across all model sizes.

Special-Character Adversarial Attacks

Novel technique introduced


Large language models (LLMs) have achieved remarkable performance across diverse natural language processing tasks, yet their vulnerability to character-level adversarial manipulations presents significant security challenges for real-world deployments. This paper presents a study of different special character attacks including unicode, homoglyph, structural, and textual encoding attacks aimed at bypassing safety mechanisms. We evaluate seven prominent open-source models ranging from 3.8B to 32B parameters on 4,000+ attack attempts. These experiments reveal critical vulnerabilities across all model sizes, exposing failure modes that include successful jailbreaks, incoherent outputs, and unrelated hallucinations.


Key Contributions

  • Taxonomy and systematic evaluation of four character-level attack families (unicode, homoglyph, structural, encoding obfuscation) against LLM safety mechanisms
  • Empirical study of 4,000+ attack attempts across seven open-source LLMs ranging from 3.8B to 32B parameters, revealing universal vulnerabilities at all model sizes
  • Public release of experimental code, attack datasets, and evaluation protocols to facilitate reproducible research and defense development

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
llm safety/content moderationopen-source language models