Haihua Shen

Papers in Database (1)

benchmark arXiv Apr 29, 2026 · 22d ago

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Wenhao Lan, Shan Li, Junbin Yang et al. · University of Chinese Academy of Sciences · Inner Mongolia University of Technology +1 more

Mechanistic analysis showing adversarial fine-tuning reorganizes LLM refusal representations across layers while navigating robustness-utility tradeoffs

Prompt Injection nlp
PDF