Shiyao Cui

Papers in Database (3)

attack arXiv Sep 14, 2025 · Sep 2025

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

Shiyao Cui, Xijia Feng, Yingkang Wang et al. · Tsinghua University · National University of Singapore

Emoji-substituted prompts bypass LLM safety filters, achieving 50% higher toxicity generation than plain-text counterparts across 7 LLMs

Prompt Injection nlp
PDF
attack arXiv Aug 7, 2025 · Aug 2025

JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering

Renmiao Chen, Shiyao Cui, Xuancheng Huang et al. · Tsinghua University · Zhipu AI +1 more

Jailbreaks VLMs by co-optimizing adversarial image perturbations and multi-agent steering prompts to maximize harmful response quality

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code
defense arXiv Apr 13, 2026 · 5w ago

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu et al. · Tsinghua University · Alibaba Group

Defends LLMs against cross-lingual jailbreaks by anchoring safety alignment in language-agnostic semantic representations rather than surface text

Prompt Injection nlp
PDF