Polina Petrova

h-index: 1 41 citations 10 papers (total)

Papers in Database (1)

benchmark arXiv Jan 30, 2026 · 9w ago

Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning

Abhishek Mishra, Mugilan Arulvanan, Reshma Ashok et al. · University of Massachusetts Amherst

Benchmarks domain-level LLM misalignment susceptibility from insecure fine-tuning and backdoor triggers, ranking 11 domains from 0% to 87.67% vulnerability

Transfer Learning Attack Model Poisoning nlp
PDF Code