Hongning Wang

attack arXiv Sep 14, 2025 · Sep 2025

Shiyao Cui, Xijia Feng, Yingkang Wang et al. · Tsinghua University · National University of Singapore

Emoji-substituted prompts bypass LLM safety filters, achieving 50% higher toxicity generation than plain-text counterparts across 7 LLMs

Prompt Injection nlp

attack arXiv Aug 7, 2025 · Aug 2025

Renmiao Chen, Shiyao Cui, Xuancheng Huang et al. · Tsinghua University · Zhipu AI +1 more

Jailbreaks VLMs by co-optimizing adversarial image perturbations and multi-agent steering prompts to maximize harmful response quality

Input Manipulation Attack Prompt Injection visionnlpmultimodal

defense arXiv Apr 13, 2026 · 5w ago

Junxiao Yang, Haoran Liu, Jinzhe Tu et al. · Tsinghua University · Alibaba Group

Defends LLMs against cross-lingual jailbreaks by anchoring safety alignment in language-agnostic semantic representations rather than surface text

Prompt Injection nlp

Papers in Database (3)