Haoran Li

Papers in Database (2)

attack arXiv Aug 1, 2025 · Aug 2025

Activation-Guided Local Editing for Jailbreaking Attacks

Jiecong Wang, Haoran Li, Hao Peng et al. · Beihang University · The Hong Kong University of Science and Technology +3 more

Two-stage LLM jailbreak uses hidden-state activations to guide text-level edits, bypassing safety alignment with SOTA attack success rates

Prompt Injection nlp
PDF Code
attack arXiv Apr 17, 2026 · 4w ago

Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries

Ki Sen Hung, Xi Yang, Chang Liu et al. · The Hong Kong University of Science and Technology · University of Science and Technology of China

Context-based jailbreak attack achieving 93%+ success by exploiting safety-research framing to trigger broad defense relaxation across frontier LLMs

Prompt Injection nlp
PDF Code