Jindong Gu

h-index: 21 1,696 citations 59 papers (total)

Papers in Database (3)

defense arXiv Oct 16, 2025 · Oct 2025

A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space

Bingjie Zhang, Yibo Yang, Zhe Ren et al. · Jilin University · King Abdullah University of Science and Technology +1 more

Defends LLM safety alignment during fine-tuning by freezing safety-relevant weight subspaces and projecting adapter updates into a harmful-resistant null space

Transfer Learning Attack Prompt Injection nlp
3 citations PDF
attack arXiv Oct 13, 2025 · Oct 2025

Bag of Tricks for Subverting Reasoning-based Safety Guardrails

Shuo Chen, Zhen Han, Haokun Chen et al. · LMU Munich · Siemens +5 more

Jailbreaks reasoning-based LLM safety guardrails via template tricks and white-box optimization, exceeding 90% attack success rate

Input Manipulation Attack Prompt Injection nlp
1 citations PDF Code
attack arXiv Oct 13, 2025 · Oct 2025

Deep Research Brings Deeper Harm

Shuo Chen, Zonggen Li, Zhen Han et al. · LMU Munich · Siemens +6 more

Proposes two jailbreak attacks on LLM research agents — plan injection and intent hijack — that bypass alignment to produce dangerous biosecurity reports

Prompt Injection Excessive Agency nlp
PDF Code