Jialing Tao

h-index: 5 99 citations 14 papers (total)

Papers in Database (3)

defense arXiv Sep 25, 2025 · Sep 2025

A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models

Qinqin He, Jiaqi Weng, Jialing Tao et al. · Alibaba Group

Defends text-to-image diffusion models against harmful content generation by suppressing a single SAE-identified neuron, with adversarial robustness

Input Manipulation Attack visionnlpgenerative
4 citations 1 influentialPDF
defense arXiv Jan 31, 2026 · 9w ago

A Causal Perspective for Enhancing Jailbreak Attack and Defense

Licheng Pan, Yunsheng Lu, Jiexi Liu et al. · Zhejiang University · University of Chicago +1 more

Causal discovery framework identifies interpretable LLM jailbreak drivers to both enhance attacks and improve prompt-level defenses

Prompt Injection nlp
PDF Code
attack arXiv Dec 5, 2025 · Dec 2025

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

Shiji Zhao, Shukun Xiong, Yao Huang et al. · Beihang University · Alibaba Group

Jailbreaks MLLMs by decomposing harmful text into sequential semantically crafted sub-images that aggregate harmful intent across frames

Prompt Injection visionnlpmultimodal
PDF