Kejiang Chen

attack arXiv Sep 21, 2025 · Sep 2025

Xingkai Peng, Jun Jiang, Meng Tong et al. · University of Science and Technology of China

Multimodal jailbreak attack on T2I safety filters by decoupling unsafe prompts into image-guided adversarial text components

Prompt Injection visionnlpmultimodalgenerative

1 citations PDF

defense arXiv Oct 18, 2025 · Oct 2025

Shuai Li, Kejiang Chen, Jun Jiang et al. · University of Science and Technology of China · A*STAR +1 more

Embeds 32-bit ownership watermarks into LLM weights via model editing in 20 seconds, enabling copyright verification without training costs

Model Theft Model Theft nlp

attack arXiv Oct 7, 2025 · Oct 2025

Meng Tong, Yuntao Du, Kejiang Chen et al. · University of Science and Technology of China · Purdue University

Exploits LLM tokenizers as a new membership inference attack vector, achieving AUC 0.771 against state-of-the-art LLM tokenizers

Membership Inference Attack nlp

defense arXiv Jan 28, 2026 · 9w ago

Xin Zhang, Zijin Yang, Kejiang Chen et al. · University of Science and Technology of China

Defends diffusion model image watermarks from black-box forgery by semantically binding latent signals via contrastive learning

Output Integrity Attack visiongenerative

benchmark arXiv Jan 29, 2026 · 9w ago

Zijin Yang, Yu Sun, Kejiang Chen et al. · University of Science and Technology of China · Anhui Province Key Laboratory of Digital Security +1 more

Proposes a unified VLM-based benchmark for evaluating residual and semantic watermarks in diffusion model image outputs

Output Integrity Attack visiongenerative

defense arXiv Sep 26, 2025 · Sep 2025

Jiawei Zhao, Yuang Qi, Weiming Zhang et al. · University of Science and Technology of China

Efficient LRM guard model replaces slow reasoning traces with prefilled tokens to detect jailbreaks in one forward pass

Prompt Injection nlp

Papers in Database (6)