Christina Q. Knight

attack arXiv Jan 20, 2026 · 11w ago

Jackson Kaunismaa, Avery Griffin, John Hughes et al. · MATS · Anthropic +1 more

Bypasses frontier LLM safeguards via adjacent-domain prompts, then fine-tunes open-source models to elicit hazardous chemical synthesis capabilities

Transfer Learning Attack Prompt Injection nlp

4 citations PDF

Papers in Database (1)