Latest papers

2 papers
benchmark arXiv Nov 13, 2025 · Nov 2025

Say It Differently: Linguistic Styles as Jailbreak Vectors

Srikant Panda, Avinash Rai · Independent Researcher · Oracle AI

Benchmarks 11 linguistic styles (fear, curiosity, compassion) as jailbreak vectors, boosting LLM attack success by up to 57 points

Prompt Injection nlp
1 citations PDF
attack arXiv Oct 2, 2025 · Oct 2025

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar et al. · Georgia Institute of Technology · Oracle AI +1 more

RL + tree search framework discovers multi-turn jailbreak strategies achieving 81.5% ASR across 12 LLMs including Claude-4-Sonnet

Prompt Injection nlp
PDF