Haoran Li

attack arXiv Aug 1, 2025 · Aug 2025

Jiecong Wang, Haoran Li, Hao Peng et al. · Beihang University · The Hong Kong University of Science and Technology +3 more

Two-stage LLM jailbreak uses hidden-state activations to guide text-level edits, bypassing safety alignment with SOTA attack success rates

Prompt Injection nlp

attack arXiv Apr 17, 2026 · 4w ago

Ki Sen Hung, Xi Yang, Chang Liu et al. · The Hong Kong University of Science and Technology · University of Science and Technology of China

Context-based jailbreak attack achieving 93%+ success by exploiting safety-research framing to trigger broad defense relaxation across frontier LLMs

Prompt Injection nlp

Papers in Database (2)