Geoffrey Irving

h-index: 0 0 citations 0 papers (total)

Papers in Database (1)

attack arXiv Feb 16, 2026 · 7w ago

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau et al. · UK AI Security Institute · University of Oxford

Fully black-box automated jailbreak using binary classifier feedback and curriculum learning defeats Anthropic and GPT-5 safety classifiers

Prompt Injection nlp
PDF