Daniil Dzenhaliou

attack arXiv Oct 10, 2025 · Oct 2025

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou et al. · MATS · EPFL +4 more

Embeds prompt injections in LLM agent outputs to subvert AI control monitors, collapsing safety-usefulness tradeoffs across protocols

Prompt Injection Excessive Agency nlp

5 citations PDF

Papers in Database (1)