Idhant Gulati

h-index: 0 0 citations 5 papers (total)

Papers in Database (1)

attack arXiv Feb 18, 2026 · 6w ago

Narrow fine-tuning erodes safety alignment in vision-language agents

Idhant Gulati, Shivam Raval · University of California · Harvard University

LoRA fine-tuning VLMs on narrow harmful datasets causes emergent safety misalignment that generalizes across modalities, with multimodal evaluation revealing 70% misalignment at rank 128

Transfer Learning Attack Prompt Injection multimodalvisionnlp
PDF