Samaksh Bhargav

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

defense ICDMW Oct 26, 2025 · Oct 2025

Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts

Samaksh Bhargav, Zining Zhu · Edison Academy Magnet School · Stevens Institute of Technology

Uses SAE feature steering guided by contrasting safe/unsafe prompt pairs to improve LLM refusal of harmful prompts without sacrificing utility

Prompt Injection nlp
PDF