ML Security Papers

defense arXiv Feb 23, 2026 · 6w ago

Thanh Q. Tran, Arun Verma, Kiwan Wong et al. · National University of Singapore · Singapore-MIT Alliance for Research and Technology Centre +2 more

Defends LLMs against jailbreaks and adversarial attacks by enforcing CBF-based safety constraints in latent representation space at inference time

Input Manipulation Attack Prompt Injection nlp

benchmark arXiv Feb 6, 2026 · 8w ago

Saad Hossain, Tom Tseng, Punya Syon Pandey et al. · Critical ML Lab · FAR.AI +6 more

Benchmark framework for evaluating LLM tamper resistance across 9 fine-tuning and weight-space attacks on 21 open-weight models

Transfer Learning Attack Prompt Injection nlp

1 citations PDF Code

Latest papers