Shengyun Si

h-index: 5 87 citations 8 papers (total)

Papers in Database (1)

attack arXiv Oct 13, 2025 · Oct 2025

Bag of Tricks for Subverting Reasoning-based Safety Guardrails

Shuo Chen, Zhen Han, Haokun Chen et al. · LMU Munich · Siemens +5 more

Jailbreaks reasoning-based LLM safety guardrails via template tricks and white-box optimization, exceeding 90% attack success rate

Input Manipulation Attack Prompt Injection nlp
1 citations PDF Code