Juan Cao

Papers in Database (1)

defense arXiv Aug 27, 2025 · Aug 2025

Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks

Sheng Liu, Qiang Sheng, Danding Wang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Proactively synthesizes jailbreak-like training examples using embedding-space analysis to harden LLM safety alignment before attacks emerge

Prompt Injection nlp
PDF