Kun Wang

Papers in Database (4)

benchmark arXiv Apr 21, 2026 · 4w ago

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Kun Wang, Cheng Qian, Miao Yu et al. · Nanyang Technological University · University of Science and Technology of China +3 more

Interpretability framework revealing that MLLM backdoors encode in low-rank projector subspaces with norm-scaled activation mechanisms

Model Poisoning multimodalnlpvision
PDF Code
defense arXiv Mar 3, 2026 · 11w ago

SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety

Zixuan Xu, Tiancheng He, Huahui Yi et al. · Huazhong University of Science and Technology · Beijing University of Posts and Telecommunications +2 more

Structured virtual tool-calling framework trains VLMs to reason explicitly about safety, blocking multimodal jailbreaks while reducing over-refusal

Prompt Injection multimodalvisionnlp
PDF Code
attack arXiv Apr 14, 2026 · 5w ago

CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems

Yongxuan Wu, Xixun Lin, He Zhang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Black-box attack inferring LLM multi-agent system communication topologies via adversarial queries, achieving 99% peak AUC

Model Theft Excessive Agency nlp
PDF Code
attack arXiv Aug 4, 2025 · Aug 2025

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers

Liang Lin, Miao Yu, Kaiwen Luo et al. · Chinese Academy of Sciences · University of Science and Technology of China +4 more

Backdoor attack on Audio LLMs using acoustic triggers like noise and speech rate achieves >90% ASR at just 3% poisoning ratio

Model Poisoning audionlp
PDF Code