Guowen Xu

benchmark arXiv Apr 9, 2026 · 6w ago

Rui Zhang, Hongwei Li, Yun Shen et al. · University of Electronic Science and Technology of China · Flexera +2 more

Evaluates six fine-tuning methods for both misaligning safety-aligned LLMs and realigning them, revealing asymmetric attack-defense dynamics

Transfer Learning Attack Prompt Injection Training Data Poisoning nlp

attack arXiv Apr 23, 2026 · 28d ago

Zihan Wang, Rui Zhang, Yu Liu et al. · University of Electronic Science and Technology of China

Black-box attacks extract proprietary LLM agent skills in 3 interactions; defenses tested but low-cost repeated attacks remain effective

Sensitive Information Disclosure Prompt Injection nlp

defense arXiv Aug 2, 2025 · Aug 2025

Zihan Wang, Rui Zhang, Hongwei Li et al. · University of Electronic Science and Technology of China · City University of Hong Kong

Detects LLM backdoors in real-time by monitoring token confidence windows that reveal the 'sequence lock' phenomenon

Model Poisoning nlp

attack arXiv Aug 26, 2025 · Aug 2025

Rui Zhang, Zihan Wang, Tianli Yang et al. · University of Electronic Science and Technology of China · City University of Hong Kong +1 more

Adversarial image attack on VLMs that maximizes output length via hidden special tokens, exhausting inference resources stealthily

Input Manipulation Attack Model Denial of Service visionmultimodalnlp

Papers in Database (4)