Nenghai Yu

h-index: 19 1,818 citations 151 papers (total)

Papers in Database (9)

defense arXiv Oct 25, 2025 · Oct 2025

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

Jindong Yang, Han Fang, Weiming Zhang et al. · University of Science and Technology of China · Anhui Province Key Laboratory of Digital Security +1 more

Proposes Tail-Truncated Sampling watermarking for diffusion model outputs, balancing robustness and generation diversity

Output Integrity Attack visiongenerative
5 citations 2 influentialPDF Code
defense CCS Oct 5, 2025 · Oct 2025

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models

Peigui Qi, Kunsheng Tang, Wenbo Zhou et al. · University of Science and Technology of China · Nanyang Technological University +1 more

Defends text-to-image models against adversarial prompt evasion attacks using EOS-token embedding detection and safety-aware feature erasure

Input Manipulation Attack visionnlpgenerative
1 citations PDF Code
attack arXiv Sep 21, 2025 · Sep 2025

Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models

Xingkai Peng, Jun Jiang, Meng Tong et al. · University of Science and Technology of China

Multimodal jailbreak attack on T2I safety filters by decoupling unsafe prompts into image-guided adversarial text components

Prompt Injection visionnlpmultimodalgenerative
1 citations PDF
defense arXiv Nov 10, 2025 · Nov 2025

LiteUpdate: A Lightweight Framework for Updating AI-Generated Image Detectors

Jiajie Lu, Zhenkan Fu, Na Zhao et al. · University of Science and Technology of China · Shanghai AI Laboratory

Proposes LiteUpdate to efficiently update AI-generated image detectors against new generators while preventing catastrophic forgetting

Output Integrity Attack visiongenerative
1 citations PDF
defense arXiv Sep 26, 2025 · Sep 2025

PSRT: Accelerating LRM-based Guard Models via Prefilled Safe Reasoning Traces

Jiawei Zhao, Yuang Qi, Weiming Zhang et al. · University of Science and Technology of China

Efficient LRM guard model replaces slow reasoning traces with prefilled tokens to detect jailbreaks in one forward pass

Prompt Injection nlp
PDF
defense arXiv Jan 28, 2026 · 10w ago

SemBind: Binding Diffusion Watermarks to Semantics Against Black-Box Forgery Attacks

Xin Zhang, Zijin Yang, Kejiang Chen et al. · University of Science and Technology of China

Defends diffusion model image watermarks from black-box forgery by semantically binding latent signals via contrastive learning

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Jan 30, 2026 · 9w ago

Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures

Yanghao Su, Wenbo Zhou, Tianwei Zhang et al. · University of Science and Technology of China · Nanyang Technological University +2 more

Mechanistic study showing character-disposition fine-tuning creates stronger, transferable LLM misalignment unifying backdoor triggers and jailbreak susceptibility

Model Poisoning Prompt Injection nlp
PDF
benchmark arXiv Jan 29, 2026 · 9w ago

WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models

Zijin Yang, Yu Sun, Kejiang Chen et al. · University of Science and Technology of China · Anhui Province Key Laboratory of Digital Security +1 more

Proposes a unified VLM-based benchmark for evaluating residual and semantic watermarks in diffusion model image outputs

Output Integrity Attack visiongenerative
PDF
defense arXiv Oct 18, 2025 · Oct 2025

EditMark: Watermarking Large Language Models based on Model Editing

Shuai Li, Kejiang Chen, Jun Jiang et al. · University of Science and Technology of China · A*STAR +1 more

Embeds 32-bit ownership watermarks into LLM weights via model editing in 20 seconds, enabling copyright verification without training costs

Model Theft Model Theft nlp
PDF