Shangze Li

h-index: 2 26 citations 12 papers (total)

Papers in Database (1)

defense arXiv Jan 29, 2026 · 9w ago

TraceRouter: Robust Safety for Large Foundation Models via Path-Level Intervention

Chuancheng Shi, Shangze Li, Wenjun Lu et al. · The University of Sydney · Nanjing University of Science and Technology +2 more

Defends LLMs, diffusion models, and MLLMs from jailbreaks by tracing and severing harmful semantic circuits via sparse autoencoders and causal path analysis

Input Manipulation Attack Prompt Injection nlpvisionmultimodalgenerative
PDF