Minlie Huang

Papers in Database (2)

attack arXiv Sep 14, 2025 · Sep 2025

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

Shiyao Cui, Xijia Feng, Yingkang Wang et al. · Tsinghua University · National University of Singapore

Emoji-substituted prompts bypass LLM safety filters, achieving 50% higher toxicity generation than plain-text counterparts across 7 LLMs

Prompt Injection nlp
PDF
attack arXiv Aug 7, 2025 · Aug 2025

JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering

Renmiao Chen, Shiyao Cui, Xuancheng Huang et al. · Tsinghua University · Zhipu AI +1 more

Jailbreaks VLMs by co-optimizing adversarial image perturbations and multi-agent steering prompts to maximize harmful response quality

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code