Latest papers

1 papers
defense arXiv Oct 2, 2025 · Oct 2025

Machine Learning for Detection and Analysis of Novel LLM Jailbreaks

John Hawkins, Aditya Pramar, Rodney Beard et al. · Pingla Institute · UNSW

Fine-tunes BERT to detect LLM jailbreak prompts, finding reflexivity in prompt structure as a key discriminating signal

Prompt Injection nlp
1 citations PDF