Zhuo Li

h-index: 6 94 citations 14 papers (total)

Papers in Database (3)

defense arXiv Sep 24, 2025 · Sep 2025

LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation

Huizhen Shu, Xuying Li, Zhuo Li · hydrox.ai

Defends LLMs against jailbreaks via VAE-supervised latent steering that selectively suppresses adversarial signals while preserving utility

Prompt Injection nlp
PDF
attack arXiv Nov 16, 2025 · Nov 2025

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

Yuting Tan, Yi Huang, Zhuo Li · hydrox.ai

Introduces compliance-only LLM backdoor using 'Sure' labels that generalize to harmful outputs when triggered at inference

Model Poisoning Data Poisoning Attack Training Data Poisoning nlp
PDF
attack arXiv Oct 29, 2025 · Oct 2025

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

André V. Duarte, Xuying li, Bin Zeng et al. · Carnegie Mellon University · Instituto Superior Técnico +1 more

Agentic feedback-loop pipeline extracts memorized copyrighted books from LLMs, improving ROUGE-L by 24% over single-pass extraction

Model Inversion Attack Sensitive Information Disclosure nlp
PDF Code