Phil Blandfort

attack arXiv Nov 1, 2025 · Nov 2025

Phil Blandfort, Robert Graham · Independent

Black-box LLM red-teaming scaffold that uses iterative ICL to evade activation probe safety monitors via natural language

Input Manipulation Attack Prompt Injection nlp

Papers in Database (1)