Kailong Wang

Papers in Database (4)

attack arXiv Apr 8, 2026 · 6w ago

RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

Ziye Wang, Guanyu Wang, Kailong Wang · Huazhong University of Science and Technology · Beihang University

Word-level poisoning attack on RAG systems that injects stealthy toxic documents to manipulate LLM outputs via retriever optimization

Data Poisoning Attack Prompt Injection Training Data Poisoning nlp
PDF
attack arXiv Apr 1, 2026 · 7w ago

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou et al. · Huazhong University of Science and Technology · Hubei University

Embeds latent trojans in individually safe LLMs that activate during model merging, bypassing safety alignment

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp
PDF
defense arXiv Apr 25, 2026 · 26d ago

UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks

Tianlong Yu, Yang Yang, Xiao Luo et al. · Hubei University · University of Southern California +1 more

Multi-layer defense against AR-LLM social engineering attacks using unlearning to suppress identity recognition and agent guardrails

Prompt Injection Excessive Agency multimodalnlp
PDF
defense arXiv Aug 5, 2025 · Aug 2025

Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models

Fan Yang, Yihao Huang, Jiayi Zhu et al. · Huazhong University of Science and Technology · National University of Singapore +2 more

Defends diffusion T2I models against NSFW generation by classifying predicted noise mid-generation, robust to adversarial prompts

Output Integrity Attack visiongenerative
PDF