Weitong Ruan

h-index: 2 60 citations 6 papers (total)

Papers in Database (2)

benchmark arXiv Oct 4, 2025 · Oct 2025

How Catastrophic is Your LLM? Certifying Risk in Conversation

Chengxiao Wang, Isha Chaudhary, Qian Hu et al. · University of Illinois · Amazon

Statistical framework certifies catastrophic LLM response risk in multi-turn conversations via Markov sampling, finding up to 70% certified risk in frontier models

Prompt Injection nlp
1 citations PDF
defense arXiv Feb 9, 2026 · 8w ago

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning, Jaylen Jones, Zhehao Zhang et al. · The Ohio State University · Amazon AGI

Guardrail system detects and corrects misaligned actions in computer-use agents, reducing indirect prompt injection attack success by 90%+

Prompt Injection Excessive Agency nlpmultimodal
PDF Code