ML Security Papers

ML Security Papers

Latest papers

1 papers

defense arXiv Oct 2, 2025 · Oct 2025

Machine Learning for Detection and Analysis of Novel LLM Jailbreaks

John Hawkins, Aditya Pramar, Rodney Beard et al. · Pingla Institute · UNSW

Fine-tunes BERT to detect LLM jailbreak prompts, finding reflexivity in prompt structure as a key discriminating signal

Prompt Injection nlp

1 citations PDF