Latest papers

3 papers
attack arXiv Oct 13, 2025 · Oct 2025

Bag of Tricks for Subverting Reasoning-based Safety Guardrails

Shuo Chen, Zhen Han, Haokun Chen et al. · LMU Munich · Siemens +5 more

Jailbreaks reasoning-based LLM safety guardrails via template tricks and white-box optimization, exceeding 90% attack success rate

Input Manipulation Attack Prompt Injection nlp
1 citations PDF Code
attack arXiv Oct 13, 2025 · Oct 2025

Deep Research Brings Deeper Harm

Shuo Chen, Zonggen Li, Zhen Han et al. · LMU Munich · Siemens +6 more

Proposes two jailbreak attacks on LLM research agents — plan injection and intent hijack — that bypass alignment to produce dangerous biosecurity reports

Prompt Injection Excessive Agency nlp
PDF Code
attack arXiv Sep 20, 2025 · Sep 2025

Can an Individual Manipulate the Collective Decisions of Multi-Agents?

Fengyuan Liu, Rui Zhao, Shuo Chen et al. · Tencent · University of Oxford +3 more

Attacks multi-agent LLM systems using optimized adversarial suffixes, misleading collective decisions with access to only one agent

Input Manipulation Attack Prompt Injection nlp
PDF Code