Han Qiu

Papers in Database (1)

attack arXiv Sep 14, 2025 · Sep 2025

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

Shiyao Cui, Xijia Feng, Yingkang Wang et al. · Tsinghua University · National University of Singapore

Emoji-substituted prompts bypass LLM safety filters, achieving 50% higher toxicity generation than plain-text counterparts across 7 LLMs

Prompt Injection nlp
PDF