Latest papers

2 papers
attack arXiv Apr 11, 2026 · 5w ago

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Vishal Pramanik, Maisha Maliha, Susmit Jha et al. · University of Oklahoma · University of Florida +1 more

Circuit-level jailbreak attack using causal head masking and nullspace steering to bypass LLM safety mechanisms with SOTA success rates

Prompt Injection nlp
PDF Code
defense arXiv Jan 22, 2026 · Jan 2026

CodeGuard: Improving LLM Guardrails in CS Education

Nishat Raihan, Noah Erdachew, Jayoti Devi et al. · George Mason University · University of Oklahoma +1 more

Defends educational LLM coding assistants from unsafe prompts via PromptShield, a fine-tuned guardrail achieving 0.93 F1

Prompt Injection nlp
PDF Code