Timothy Baldwin

h-index: 4 76 citations 10 papers (total)

Papers in Database (2)

defense IJCNLP-AACL Oct 19, 2025 · Oct 2025

Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization

Masahiro Kaneko, Zeerak Talat, Timothy Baldwin · MBZUAI · University of Edinburgh

Online learning defense dynamically counters iterative LLM jailbreaks via RL prompt optimization and gradient damping

Prompt Injection nlp
3 citations PDF
benchmark arXiv Oct 19, 2025 · Oct 2025

Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs

Masahiro Kaneko, Timothy Baldwin · MBZUAI

Information-theoretic framework bounds LLM adversarial query complexity as log(1/ε)/I(Z;T), quantifying exact security cost of exposing logits or chain-of-thought

Prompt Injection Sensitive Information Disclosure nlp
PDF