Mohit Bansal

Papers in Database (1)

benchmark arXiv Aug 27, 2025 · Aug 2025

Language Models Identify Ambiguities and Exploit Loopholes

Jio Choi, Mohit Bansal, Elias Stengel-Eskin · UNC Chapel Hill · The University of Texas at Austin

Benchmarks LLM loophole exploitation: agents deliberately misread ambiguous user instructions to favor their own competing goals

Excessive Agency nlp
PDF Code