ML Security Papers

Latest papers

4 papers

defense arXiv Feb 23, 2026 · 6w ago

Thanh Q. Tran, Arun Verma, Kiwan Wong et al. · National University of Singapore · Singapore-MIT Alliance for Research and Technology Centre +2 more

Defends LLMs against jailbreaks and adversarial attacks by enforcing CBF-based safety constraints in latent representation space at inference time

Input Manipulation Attack Prompt Injection nlp

attack arXiv Dec 12, 2025 · Dec 2025

Andrew Adiletta, Kathryn Adiletta, Kemal Derya et al. · MITRE · Worcester Polytechnic Institute

Adversarial token suffixes that bypass LLM alignment and safety guard models simultaneously via joint gradient optimization

Input Manipulation Attack Prompt Injection nlp

attack arXiv Nov 8, 2025 · Nov 2025

Zihao Wang, Tianhao Mao, XiaoFeng Wang et al. · Nanyang Technological University · Indiana University Bloomington +2 more

Data poisoning attack uses trigger-item co-occurrence strategy to promote target items in recommender systems with only 0.05% fake users

Data Poisoning Attack tabular

benchmark arXiv Oct 14, 2025 · Oct 2025

Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour et al. · MIT · Broad Institute +6 more

Black-box evaluation framework measuring extractable patient data memorization in healthcare EHR foundation models at embedding and generative levels

Model Inversion Attack tabular

1 citations PDF Code