Ruixiang Tang

h-index: 5 79 citations 15 papers (total)

Papers in Database (1)

benchmark arXiv Oct 5, 2025 · Oct 2025

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs

Rui Wu, Yihao Quan, Zeru Shi et al. · Rutgers University

Identifies 'consequence-blindness' in LLMs, benchmarks jailbreak and over-refusal failures across semantic/outcome risk mismatches, and fine-tunes defenses with consequence-aware data

Prompt Injection nlp
1 citations 1 influentialPDF Code