Sarah Egler

tool arXiv Oct 17, 2025 · Oct 2025

Sarah Egler, John Schulman, Nicholas Carlini · MATS · Anthropic +1 more

LLM auditing agent detects adversarial fine-tuning attacks, including covert cipher backdoors, before model deployment

Transfer Learning Attack Model Poisoning Prompt Injection nlp

3 citations PDF Code

Papers in Database (1)