Mark Russinovich

h-index: 22 4,095 citations 59 papers (total)

Papers in Database (2)

attack arXiv Feb 5, 2026 · 8w ago

GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt

Mark Russinovich, Yanan Cai, Keegan Hines et al. · Microsoft

Uses GRPO reinforcement fine-tuning with a single prompt to strip safety alignment from LLMs and diffusion models, outperforming prior unalignment attacks

Transfer Learning Attack Prompt Injection nlpgenerative

PDF

defense arXiv Feb 11, 2026 · 7w ago

Optimizing Agent Planning for Security and Autonomy

Aashish Kolluri, Rishi Sharma, Manuel Costa et al. · Microsoft · EPFL +1 more

Defends AI agents against indirect prompt injection via security-aware planning that maximizes autonomous operation without human oversight

Prompt Injection Excessive Agency nlp

PDF