David Montero

h-index: 1 28 citations 4 papers (total)

Papers in Database (1)

defense arXiv Dec 18, 2025 · Dec 2025

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics

Iker García-Ferrero, David Montero, Roman Orus · Multiverse Computing

Activation steering method that surgically removes political over-refusal in LLMs while preserving safety alignment for harmful content

Prompt Injection nlp
PDF