Rui Zhang

Papers in Database (3)

benchmark arXiv Apr 9, 2026 · 6w ago

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen et al. · University of Electronic Science and Technology of China · Flexera +2 more

Evaluates six fine-tuning methods for both misaligning safety-aligned LLMs and realigning them, revealing asymmetric attack-defense dynamics

Transfer Learning Attack Prompt Injection Training Data Poisoning nlp
PDF Code
attack arXiv Aug 26, 2025 · Aug 2025

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Rui Zhang, Zihan Wang, Tianli Yang et al. · University of Electronic Science and Technology of China · City University of Hong Kong +1 more

Adversarial image attack on VLMs that maximizes output length via hidden special tokens, exhausting inference resources stealthily

Input Manipulation Attack Model Denial of Service visionmultimodalnlp
PDF Code
defense arXiv Aug 2, 2025 · Aug 2025

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

Zihan Wang, Rui Zhang, Hongwei Li et al. · University of Electronic Science and Technology of China · City University of Hong Kong

Detects LLM backdoors in real-time by monitoring token confidence windows that reveal the 'sequence lock' phenomenon

Model Poisoning nlp
PDF Code