ML Security Papers

Latest papers

4 papers

attack arXiv Apr 11, 2026 · 5w ago

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Vishal Pramanik, Maisha Maliha, Susmit Jha et al. · University of Oklahoma · University of Florida +1 more

Circuit-level jailbreak attack using causal head masking and nullspace steering to bypass LLM safety mechanisms with SOTA success rates

Prompt Injection nlp

PDF Code

survey arXiv Feb 6, 2026 · Feb 2026

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al. · IARPA · NIST +13 more

Surveys IARPA TrojAI program findings on AI backdoor detection via weight analysis and trigger inversion across multi-year research

Model Poisoning visionnlp

PDF

defense arXiv Sep 17, 2025 · Sep 2025

Privacy Preserving In-Context-Learning Framework for Large Language Models

Bishnu Bhusal, Manoj Acharya, Ramneet Kaur et al. · University of Missouri · SRI International

Defends private in-context learning by applying differential privacy to aggregated token distributions, preventing adversarial extraction of sensitive prompt data

Sensitive Information Disclosure nlp

PDF Code

survey arXiv Sep 12, 2025 · Sep 2025

LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems

Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello et al. · Instituto de Pesquisas Eldorado · SRI International

Systematic survey of threats and defenses across the full LLM-based system lifecycle, from training to deployment

Data Poisoning Attack AI Supply Chain Attacks Prompt Injection Sensitive Information Disclosure Insecure Plugin Design nlp

PDF

Latest papers

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Trojans in Artificial Intelligence (TrojAI) Final Report

Privacy Preserving In-Context-Learning Framework for Large Language Models

LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue