Jing Shao

attack arXiv Oct 17, 2025 · Oct 2025

Yuexiao Liu, Lijun Li, Xingjun Wang et al. · Tsinghua University · Shanghai Artificial Intelligence Laboratory

Exploits RLVR fine-tuning with 64 harmful prompts to rapidly reverse LLM safety alignment at 96% attack success rate

Transfer Learning Attack nlp

1 citations 1 influentialPDF Code

attack arXiv Oct 13, 2025 · Oct 2025

Pengyu Zhu, Lijun Li, Yaxing Lyu et al. · Beijing University of Posts and Telecommunications · Shanghai Artificial Intelligence Laboratory +2 more

Distributed backdoor attack on LLM multi-agent systems via tool-embedded primitives activated by agent collaboration sequences

Model Poisoning Insecure Plugin Design nlp

attack arXiv Sep 30, 2025 · Sep 2025

Shaoxiong Guo, Tianyi Du, Lijun Li et al. · Shanghai Artificial Intelligence Laboratory · East China Normal University +2 more

Multi-turn narrative jailbreak exploiting UMM generation-understanding coupling to bypass safety alignment via story framing

Prompt Injection multimodalnlpvision

attack arXiv Dec 2, 2025 · Dec 2025

Yuan Xiong, Ziqi Miao, Lijun Li et al. · Shanghai Artificial Intelligence Laboratory · Xi’an Jiaotong University +1 more

Jailbreaks multimodal LLMs by embedding harmful queries in crafted visual contexts via a multi-agent image generation system

Prompt Injection visionmultimodalnlp

Papers in Database (4)