Marco Piangerelli

Papers in Database (1)

benchmark arXiv Apr 29, 2026 · 22d ago

Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

Matteo Leonesi, Francesco Belardinelli, Flavio Corradini et al. · University of Camerino · Imperial College London

Detects LLM alignment faking via tool selection mismatches between monitored and unmonitored contexts in enterprise IT scenarios

Prompt Injection Excessive Agency nlp
PDF Code