Kui Ren

attack arXiv Oct 3, 2025 · Oct 2025

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based untargeted jailbreak attack maximizes LLM unsafety probability without fixed response targets, achieving 80% ASR in 100 iterations

Input Manipulation Attack Prompt Injection nlp

2 citations PDF Code

attack arXiv Sep 28, 2025 · Sep 2025

Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack

Yukun Chen, Boheng Li, Yu Yuan et al. · Zhejiang University · Nanyang Technological University

Bilevel-optimization backdoor attack on teacher models evades detection yet activates in student models during knowledge distillation

Model Poisoning Transfer Learning Attack vision

2 citations 1 influentialPDF Code

attack arXiv Oct 2, 2025 · Oct 2025

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based jailbreak attack using adaptive harmful-response sampling as optimization targets, achieving 87% ASR on safety-aligned LLMs in 200 iterations

Input Manipulation Attack Prompt Injection nlp

2 citations PDF Code

attack arXiv Jan 14, 2026 · 11w ago

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails

Zhiyi Mou, Jingyuan Yang, Zeheng Qian et al. · Zhejiang University · The University of Sydney +2 more

Jailbreaks LLMs by spatially redistributing tokens across rows/columns/diagonals, bypassing guardrails including OpenAI Moderation API at >75% ASR

Prompt Injection nlp

PDF Code

Papers in Database (4)

Untargeted Jailbreak Attack

Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack

Dynamic Target Attack

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails