Jiaheng Zhang

h-index: 4 160 citations 19 papers (total)

Papers in Database (4)

defense arXiv Sep 29, 2025 · Sep 2025

DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

Zherui Li, Zheng Nie, Zhenhong Zhou et al. · Beijing University of Posts and Telecommunications · National University of Singapore +5 more

Defends diffusion LLMs against jailbreaks by fixing greedy remasking bias and block-level autonomous safety repair

Prompt Injection nlp
3 citations 2 influentialPDF Code
defense arXiv Nov 5, 2025 · Nov 2025

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

Wenyuan Yang, Yichen Sun, Changzheng Chen et al. · Sun Yat-Sen University · Zhejiang University +2 more

Watermarks CLIP soft prompts via sequential OOD class ordering to detect if third-party models stole protected prompts

Model Theft visionmultimodal
PDF
defense arXiv Jan 1, 2026 · Jan 2026

Making Theft Useless: Adulteration-Based Protection of Proprietary Knowledge Graphs in GraphRAG Systems

Weijie Wang, Peizhuo Lv, Yan Wang et al. · Chinese Academy of Sciences · National University of Singapore +2 more

Injects false 'adulterant' facts into proprietary Knowledge Graphs to render stolen copies unusable in competing GraphRAG deployments

Model Theft nlpgraph
PDF
defense arXiv Feb 3, 2026 · 8w ago

GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, and Video

Zhenhao Zhu, Yue Liu, Yanpei Guo et al. · Tsinghua University · National University of Singapore +2 more

Reasoning-based omni-modal guardrail using SFT+GRPO to detect harmful text, image, and video LLM outputs

Prompt Injection multimodalnlpvision
PDF Code