Marco Piangerelli

benchmark arXiv Apr 29, 2026 · 22d ago

Matteo Leonesi, Francesco Belardinelli, Flavio Corradini et al. · University of Camerino · Imperial College London

Detects LLM alignment faking via tool selection mismatches between monitored and unmonitored contexts in enterprise IT scenarios

Prompt Injection Excessive Agency nlp

Papers in Database (1)