Wenbo Zhou

h-index: 22 3,350 citations 77 papers (total)

Papers in Database (2)

defense CCS Oct 5, 2025 · Oct 2025

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models

Peigui Qi, Kunsheng Tang, Wenbo Zhou et al. · University of Science and Technology of China · Nanyang Technological University +1 more

Defends text-to-image models against adversarial prompt evasion attacks using EOS-token embedding detection and safety-aware feature erasure

Input Manipulation Attack visionnlpgenerative
1 citations PDF Code
benchmark arXiv Jan 30, 2026 · 9w ago

Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures

Yanghao Su, Wenbo Zhou, Tianwei Zhang et al. · University of Science and Technology of China · Nanyang Technological University +2 more

Mechanistic study showing character-disposition fine-tuning creates stronger, transferable LLM misalignment unifying backdoor triggers and jailbreak susceptibility

Model Poisoning Prompt Injection nlp
PDF