Yun Shen

Papers in Database (1)

benchmark arXiv Apr 9, 2026 · 6w ago

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen et al. · University of Electronic Science and Technology of China · Flexera +2 more

Evaluates six fine-tuning methods for both misaligning safety-aligned LLMs and realigning them, revealing asymmetric attack-defense dynamics

Transfer Learning Attack Prompt Injection Training Data Poisoning nlp
PDF Code