Yijun Lin

Papers in Database (1)

benchmark arXiv Aug 22, 2025 ยท Aug 2025

Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs

Yu Yan, Sheng Sun, Zhe Wang et al.

Reveals that jailbreak success rates overstate LLM misuse risk because models lack real criminal knowledge and LLM judges anchor on toxic language patterns

Prompt Injection nlp
PDF