ML Security Papers

Latest papers

2 papers

defense arXiv Apr 21, 2026 · 4w ago

Jiaming Zhang, Meng Ding, Shaopeng Fu et al. · King Abdullah University of Science and Technology · Renmin University of China +2 more

Theoretical analysis proving Vision Transformers achieve benign overfitting under adversarial training with bounded perturbations

Input Manipulation Attack vision

defense arXiv Sep 29, 2025 · Sep 2025

Zihao Zhu, Xinyu Wu, Gehan Hu et al. · The Chinese University of Hong Kong · State University of New York at Buffalo +1 more

Adversarial CoT fine-tuning teaches reasoning models to self-correct harmful drifts, improving jailbreak robustness while reducing over-refusal

Prompt Injection nlp

2 citations PDF