Iker García-Ferrero

h-index: 4 72 citations 7 papers (total)

Papers in Database (1)

defense arXiv Dec 18, 2025 · Dec 2025

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics

Iker García-Ferrero, David Montero, Roman Orus · Multiverse Computing

Activation steering method that surgically removes political over-refusal in LLMs while preserving safety alignment for harmful content

Prompt Injection nlp
PDF