Md Rysul Kabir

Papers in Database (1)

benchmark arXiv Apr 20, 2026 · 4w ago

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj · Indiana University Bloomington

Compares three LLM jailbreak methods—harmful fine-tuning, RLVR, and abliteration—showing vastly different behavioral and mechanistic failure modes despite similar attack success

Prompt Injection nlp
PDF