Ziwei Xu

attack arXiv Jan 27, 2026 · 9w ago

Yangyang Guo, Ziwei Xu, Si Liu et al. · National University of Singapore · Beihang University

Fine-tunes LLMs on 1,000 benign samples with refusal prefixes to erase safety alignment across 16 models including GPT and Gemini

Transfer Learning Attack Prompt Injection nlp

Papers in Database (1)