Shiqian Zhao

Papers in Database (3)

attack arXiv Aug 9, 2025 · Aug 2025

Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models

Shiqian Zhao, Chong Wang, Yiming Li et al. · Nanyang Technological University · National University of Singapore +2 more

Reverse-engineers valuable user prompts from T2I showcase images by interacting with a local proxy diffusion model

Model Theft Sensitive Information Disclosure visionnlpgenerative
PDF
attack arXiv Mar 6, 2026 · 4w ago

Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads

Jinman Wu, Yi Xie, Shiqian Zhao et al. · Xidian University · Tsinghua University +1 more

White-box jailbreak targeting LLM attention heads via layer-wise perturbation, improving ASR 14% over SOTA

Input Manipulation Attack Prompt Injection nlp
PDF Code
attack arXiv Mar 6, 2026 · 4w ago

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin et al. · Xidian University · Tsinghua University +2 more

Discovers two disentangled safety subspaces in LLMs and exploits them to surgically disable refusal while preserving harmfulness recognition

Prompt Injection nlp
PDF Code