Hongwei Li

benchmark arXiv Apr 9, 2026 · 6w ago

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen et al. · University of Electronic Science and Technology of China · Flexera +2 more

Evaluates six fine-tuning methods for both misaligning safety-aligned LLMs and realigning them, revealing asymmetric attack-defense dynamics

Transfer Learning Attack Prompt Injection Training Data Poisoning nlp

PDF Code

defense arXiv Aug 2, 2025 · Aug 2025

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

Zihan Wang, Rui Zhang, Hongwei Li et al. · University of Electronic Science and Technology of China · City University of Hong Kong

Detects LLM backdoors in real-time by monitoring token confidence windows that reveal the 'sequence lock' phenomenon

Model Poisoning nlp

PDF Code

defense arXiv Jan 11, 2025 · Jan 2025

DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Wenshu Fan, Minxing Zhang, Hongwei Li et al. · University of Electronic Science and Technology of China · CISPA Helmholtz Center for Information Security +1 more

Introduces adaptive gallery-update attack breaking all AFR defenses, then counters with diverse adversarial perturbations for facial privacy

Input Manipulation Attack vision

PDF Code

attack arXiv Aug 26, 2025 · Aug 2025

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Rui Zhang, Zihan Wang, Tianli Yang et al. · University of Electronic Science and Technology of China · City University of Hong Kong +1 more

Adversarial image attack on VLMs that maximizes output length via hidden special tokens, exhausting inference resources stealthily

Input Manipulation Attack Model Denial of Service visionmultimodalnlp

PDF Code

Papers in Database (4)

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models