Mia Taylor

defense arXiv Oct 5, 2025 · Oct 2025

Daniel Tan, Anders Woodruff, Niels Warncke et al. · University College London · Center on Long-Term Risk +2 more

Proposes inoculation prompting, a training-time technique that suppresses backdoors and emergent misalignment in fine-tuned LLMs at test time

Model Poisoning Prompt Injection nlp

8 citations PDF

Papers in Database (1)