Sarah Egler

h-index: 1 3 citations 1 papers (total)

Papers in Database (1)

tool arXiv Oct 17, 2025 · Oct 2025

Detecting Adversarial Fine-tuning with Auditing Agents

Sarah Egler, John Schulman, Nicholas Carlini · MATS · Anthropic +1 more

LLM auditing agent detects adversarial fine-tuning attacks, including covert cipher backdoors, before model deployment

Transfer Learning Attack Model Poisoning Prompt Injection nlp
3 citations PDF Code