Zhuo Li

Papers in Database (1)

attack arXiv Aug 14, 2025 · Aug 2025

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

Huizhen Shu, Xuying Li, Qirui Wang et al. · hydrox.ai · University of Washington +1 more

Jailbreaks LLMs by perturbing sparse autoencoder features in hidden layers to generate adversarial texts that evade safety defenses

Input Manipulation Attack Prompt Injection nlp
PDF