Latest papers

5 papers
attack arXiv Feb 25, 2026 · 6w ago

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu, Jiaqi Xu, Jianwei Ding et al. · People’s Public Security University of China

Injects backdoors into text-to-image diffusion models via malicious LoRA adapters masquerading as benign community-shared modules, achieving 99.8% attack success rate.

Model Poisoning AI Supply Chain Attacks visiongenerativemultimodal
PDF Code
defense arXiv Jan 19, 2026 · 11w ago

KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

Zhenhua Xu, Xiaoning Tian, Wenjun Zeng et al. · Zhejiang University · GenTel.io +4 more

Defends LLM IP by embedding kinship-narrative knowledge into model weights for stealthy, robust ownership verification

Model Theft Model Theft nlp
PDF Code
attack arXiv Jan 7, 2026 · Jan 2026

SearchAttack: Red-Teaming LLMs against Knowledge-to-Action Threats under Online Web Search

Yu Yan, Sheng Sun, Mingfeng Li et al. · Institute of Computing Technology · University of Chinese Academy of Sciences +4 more

Red-teams search-augmented LLMs via indirect prompt injection through web search to elicit harmful knowledge-to-action outputs

Prompt Injection nlp
PDF
attack arXiv Sep 4, 2025 · Sep 2025

MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors

Xin Tong, Zhi Lin, Jingya Wang et al. · People’s Public Security University of China · Tsinghua University +2 more

Factorizes LLM refusal directions into topic-specific vectors to achieve fine-grained, semantically controlled safety alignment bypass

Prompt Injection nlp
PDF
defense arXiv Aug 31, 2025 · Aug 2025

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

Zhenhua Xu, Zhaokun Yan, Binhan Xu et al. · Zhejiang University · China Academy of Information and Communications Technology +3 more

Embeds backdoor ownership fingerprints into LoRA adapters for lightweight, transferable LLM IP protection across downstream models

Model Theft Model Theft nlp
PDF Code