Mia Taylor

h-index: 1 8 citations 2 papers (total)

Papers in Database (1)

defense arXiv Oct 5, 2025 · Oct 2025

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

Daniel Tan, Anders Woodruff, Niels Warncke et al. · University College London · Center on Long-Term Risk +2 more

Proposes inoculation prompting, a training-time technique that suppresses backdoors and emergent misalignment in fine-tuned LLMs at test time

Model Poisoning Prompt Injection nlp
8 citations PDF