Ruixiang Tang

benchmark arXiv Oct 5, 2025 · Oct 2025

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs

Rui Wu, Yihao Quan, Zeru Shi et al. · Rutgers University

Identifies 'consequence-blindness' in LLMs, benchmarks jailbreak and over-refusal failures across semantic/outcome risk mismatches, and fine-tunes defenses with consequence-aware data

Prompt Injection nlp

1 citations 1 influentialPDF Code

Papers in Database (1)

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs