Jialing Tao

defense arXiv Sep 25, 2025 · Sep 2025

Qinqin He, Jiaqi Weng, Jialing Tao et al. · Alibaba Group

Defends text-to-image diffusion models against harmful content generation by suppressing a single SAE-identified neuron, with adversarial robustness

Input Manipulation Attack visionnlpgenerative

4 citations 1 influentialPDF

defense arXiv Jan 31, 2026 · 9w ago

Licheng Pan, Yunsheng Lu, Jiexi Liu et al. · Zhejiang University · University of Chicago +1 more

Causal discovery framework identifies interpretable LLM jailbreak drivers to both enhance attacks and improve prompt-level defenses

Prompt Injection nlp

attack arXiv Dec 5, 2025 · Dec 2025

Shiji Zhao, Shukun Xiong, Yao Huang et al. · Beihang University · Alibaba Group

Jailbreaks MLLMs by decomposing harmful text into sequential semantically crafted sub-images that aggregate harmful intent across frames

Prompt Injection visionnlpmultimodal

Papers in Database (3)