Fan Yang

Papers in Database (2)

defense arXiv Aug 9, 2025 · Aug 2025

The Cost of Thinking: Increased Jailbreak Risk in Large Language Models

Fan Yang · Jinan University

Discovers thinking-mode LLMs are more jailbreak-vulnerable and defends via safe thinking intervention using special tokens

Prompt Injection nlp
PDF Code
attack arXiv Mar 10, 2026 · 27d ago

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang · Jinan University

Jailbreaks thinking-mode LLMs by interleaving multi-task streams, character reversal, and format constraints in a single prompt

Prompt Injection nlp
PDF Code