Zhan Qin

attack arXiv Oct 3, 2025 · Oct 2025

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based untargeted jailbreak attack maximizes LLM unsafety probability without fixed response targets, achieving 80% ASR in 100 iterations

Input Manipulation Attack Prompt Injection nlp

2 citations PDF Code

attack arXiv Oct 2, 2025 · Oct 2025

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based jailbreak attack using adaptive harmful-response sampling as optimization targets, achieving 87% ASR on safety-aligned LLMs in 200 iterations

Input Manipulation Attack Prompt Injection nlp

2 citations PDF Code

attack arXiv Jan 14, 2026 · 11w ago

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails

Zhiyi Mou, Jingyuan Yang, Zeheng Qian et al. · Zhejiang University · The University of Sydney +2 more

Jailbreaks LLMs by spatially redistributing tokens across rows/columns/diagonals, bypassing guardrails including OpenAI Moderation API at >75% ASR

Prompt Injection nlp

PDF Code

Papers in Database (3)

Untargeted Jailbreak Attack

Dynamic Target Attack

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails