Shivam Raval

h-index: 0 0 citations 2 papers (total)

Papers in Database (2)

attack arXiv Feb 18, 2026 · 6w ago

Narrow fine-tuning erodes safety alignment in vision-language agents

Idhant Gulati, Shivam Raval · University of California · Harvard University

LoRA fine-tuning VLMs on narrow harmful datasets causes emergent safety misalignment that generalizes across modalities, with multimodal evaluation revealing 70% misalignment at rank 128

Transfer Learning Attack Prompt Injection multimodalvisionnlp
PDF
benchmark arXiv Sep 16, 2025 · Sep 2025

Towards mitigating information leakage when evaluating safety monitors

Gerard Boxo, Aman Neelappa, Shivam Raval · Independent · Harvard University

Benchmarks LLM safety monitors (linear probes) revealing 10–40% AUROC inflation from textual leakage artifacts, not genuine internal signals

Prompt Injection nlp
PDF