Tianwei Zhang

h-index: 9 376 citations 44 papers (total)

Papers in Database (3)

defense CCS Oct 5, 2025 · Oct 2025

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models

Peigui Qi, Kunsheng Tang, Wenbo Zhou et al. · University of Science and Technology of China · Nanyang Technological University +1 more

Defends text-to-image models against adversarial prompt evasion attacks using EOS-token embedding detection and safety-aware feature erasure

Input Manipulation Attack visionnlpgenerative
1 citations PDF Code
benchmark arXiv Jan 30, 2026 · 9w ago

Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures

Yanghao Su, Wenbo Zhou, Tianwei Zhang et al. · University of Science and Technology of China · Nanyang Technological University +2 more

Mechanistic study showing character-disposition fine-tuning creates stronger, transferable LLM misalignment unifying backdoor triggers and jailbreak susceptibility

Model Poisoning Prompt Injection nlp
PDF
attack arXiv Jan 31, 2026 · 9w ago

DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems

Haoran Ou, Kangjie Chen, Gelei Deng et al. · Nanyang Technological University · A*STAR

Agent-based adversarial claim attacks on search-augmented LLM fact-checkers disrupt retrieval and reasoning, dropping accuracy from 78.7% to 53.7%

Prompt Injection nlp
PDF