Kailong Wang

Papers in Database (2)

attack arXiv Apr 1, 2026 · 5d ago

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou et al. · Huazhong University of Science and Technology · Hubei University

Embeds latent trojans in individually safe LLMs that activate during model merging, bypassing safety alignment

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp
PDF
defense arXiv Aug 5, 2025 · Aug 2025

Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models

Fan Yang, Yihao Huang, Jiayi Zhu et al. · Huazhong University of Science and Technology · National University of Singapore +2 more

Defends diffusion T2I models against NSFW generation by classifying predicted noise mid-generation, robust to adversarial prompts

Output Integrity Attack visiongenerative
PDF