Jack Lindsey

h-index: 10 677 citations 32 papers (total)

Papers in Database (1)

defense arXiv Jan 15, 2026 · 11w ago

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala et al. · MATS · Anthropic Fellows Program +2 more

Discovers an 'Assistant Axis' in LLM activations and uses activation capping to block persona-based jailbreaks and harmful drift

Prompt Injection nlp
10 citations 1 influentialPDF