Kevin Scaria

Papers in Database (1)

defense arXiv Apr 6, 2026 · 6w ago

Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering

Purva Chiniya, Kevin Scaria, Sagar Chaturvedi · Amazon

Dual-anchor gradient detection combined with deterministic refusal-token injection to prevent LLM jailbreaks while reducing false positives by 52%

Prompt Injection nlp
PDF Code