Guowen Xu

Papers in Database (4)

benchmark arXiv Apr 9, 2026 · 6w ago

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen et al. · University of Electronic Science and Technology of China · Flexera +2 more

Evaluates six fine-tuning methods for both misaligning safety-aligned LLMs and realigning them, revealing asymmetric attack-defense dynamics

Transfer Learning Attack Prompt Injection Training Data Poisoning nlp
PDF Code
attack arXiv Apr 23, 2026 · 28d ago

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu et al. · University of Electronic Science and Technology of China

Black-box attacks extract proprietary LLM agent skills in 3 interactions; defenses tested but low-cost repeated attacks remain effective

Sensitive Information Disclosure Prompt Injection nlp
PDF
defense arXiv Aug 2, 2025 · Aug 2025

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

Zihan Wang, Rui Zhang, Hongwei Li et al. · University of Electronic Science and Technology of China · City University of Hong Kong

Detects LLM backdoors in real-time by monitoring token confidence windows that reveal the 'sequence lock' phenomenon

Model Poisoning nlp
PDF Code
attack arXiv Aug 26, 2025 · Aug 2025

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Rui Zhang, Zihan Wang, Tianli Yang et al. · University of Electronic Science and Technology of China · City University of Hong Kong +1 more

Adversarial image attack on VLMs that maximizes output length via hidden special tokens, exhausting inference resources stealthily

Input Manipulation Attack Model Denial of Service visionmultimodalnlp
PDF Code