Latest papers

1 papers
defense arXiv Feb 8, 2026 · 8w ago

Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation

Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi et al. · KAUST · LUMS

Lightweight ensemble classifier (430M params) that detects LLM jailbreaks and prompt injections, outperforming billion-parameter guardrails

Prompt Injection nlp
PDF Code