Latest papers

2 papers
defense arXiv Oct 8, 2025 · Oct 2025

Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Tavish McDonald, Bo Lei, Stanislav Fort et al. · Lawrence Livermore National Laboratory · Independent Researcher

Proposes RICH hypothesis: inference-time compute scaling amplifies VLM adversarial robustness only when base model is first adversarially trained

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF
defense arXiv Jan 5, 2025 · Jan 2025

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense

Yang Ouyang, Hengrui Gu, Shuhang Lin et al. · North Carolina State University · Rutgers University +4 more

Defends LLMs against jailbreaks by identifying harmful-token-generating layers and patching them via adversarial unlearning

Prompt Injection nlp
PDF Code