Fan Yang

h-index: 8 564 citations 40 papers (total)

Papers in Database (1)

attack arXiv Dec 5, 2025 · Dec 2025

Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models

Fan Yang · Jinan University

Jailbreaks LLMs by rewriting harmful prompts into safe-isomorphic ones, generating responses, then reverse-mapping to harmful outputs

Prompt Injection nlp
PDF