ML Security Papers

Latest papers

2 papers

attack arXiv Apr 11, 2026 · 5w ago

Vishal Pramanik, Maisha Maliha, Susmit Jha et al. · University of Oklahoma · University of Florida +1 more

Circuit-level jailbreak attack using causal head masking and nullspace steering to bypass LLM safety mechanisms with SOTA success rates

Prompt Injection nlp

defense arXiv Jan 22, 2026 · Jan 2026

Nishat Raihan, Noah Erdachew, Jayoti Devi et al. · George Mason University · University of Oklahoma +1 more

Defends educational LLM coding assistants from unsafe prompts via PromptShield, a fine-tuned guardrail achieving 0.93 F1

Prompt Injection nlp