ML Security Papers

defense arXiv Oct 8, 2025 · Oct 2025

Tavish McDonald, Bo Lei, Stanislav Fort et al. · Lawrence Livermore National Laboratory · Independent Researcher

Proposes RICH hypothesis: inference-time compute scaling amplifies VLM adversarial robustness only when base model is first adversarially trained

Input Manipulation Attack Prompt Injection visionmultimodalnlp

defense arXiv Jan 5, 2025 · Jan 2025

Yang Ouyang, Hengrui Gu, Shuhang Lin et al. · North Carolina State University · Rutgers University +4 more

Defends LLMs against jailbreaks by identifying harmful-token-generating layers and patching them via adversarial unlearning

Prompt Injection nlp

Latest papers