Latest papers

1 papers
defense arXiv Feb 4, 2026 · 8w ago

$C$-$ΔΘ$: Circuit-Restricted Weight Arithmetic for Selective Refusal

Aditya Kasliwal, Pratinav Seth, Vinay Kumar Sankarapu · Lexsi Labs

Moves LLM selective refusal offline via circuit-guided weight editing on <5% of parameters, eliminating recurring inference-time safety hooks

Prompt Injection nlp
PDF