attack 2026

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Zonghao Ying 1, Haowen Dai 2, Lianyu Hu 1, Zonglei Jing 1, Quanchen Zou 3, Yaodong Yang 4, Aishan Liu 1, Xianglong Liu 5,1

0 citations

α

Published on arXiv

2604.05853

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 65.57% average attack success rate across 7 T2I models (peaking at 91.00%), significantly outperforming existing baselines

Etch

Novel technique introduced


Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and formalize the inscriptive jailbreak, where an adversary coerces a T2I system into generating images containing harmful textual payloads (e.g., fraudulent documents) embedded within visually benign scenes. Unlike traditional depictive jailbreaks that elicit visually objectionable imagery, inscriptive attacks weaponize the text-rendering capability itself. Because existing jailbreak techniques are designed for coarse visual manipulation, they struggle to bypass multi-stage safety filters while maintaining character-level fidelity. To expose this vulnerability, we propose Etch, a black-box attack framework that decomposes the adversarial prompt into three functionally orthogonal layers: semantic camouflage, visual-spatial anchoring, and typographic encoding. This decomposition reduces joint optimization over the full prompt space to tractable sub-problems, which are iteratively refined through a zero-order loop. In this process, a vision-language model critiques each generated image, localizes failures to specific layers, and prescribes targeted revisions. Extensive evaluations across 7 models on the 2 benchmarks demonstrate that Etch achieves an average attack success rate of 65.57% (peaking at 91.00%), significantly outperforming existing baselines. Our results reveal a critical blind spot in current T2I safety alignments and underscore the urgent need for typography-aware defense multimodal mechanisms.


Key Contributions

  • Formalizes inscriptive jailbreak attack distinguishing it from depictive jailbreaks by weaponizing text-rendering capability
  • Proposes Etch framework with three-layer prompt decomposition (semantic camouflage, visual-spatial anchoring, typographic encoding)
  • Zero-order optimization loop using VLM critic for iterative prompt refinement without gradient access

🛡️ Threat Analysis


Details

Domains
multimodalvisionnlp
Model Types
multimodaldiffusion
Threat Tags
black_boxinference_timetargeted
Applications
text-to-image generationcontent moderation evasion