Di Wang

defense arXiv Apr 14, 2026 · 5w ago

Shaopeng Fu, Di Wang · King Abdullah University of Science and Technology

Proves why continuous adversarial training defends LLMs against jailbreaks and proposes embedding regularization for better robustness

Input Manipulation Attack Prompt Injection nlp

defense arXiv Sep 17, 2025 · Sep 2025

Zhaoyang Chu, Yao Wan, Zhikun Zhang et al. · Huazhong University of Science and Technology · Zhejiang University +4 more

Defends code LLMs against sensitive training data extraction by selectively unlearning memorized PII and credentials via gradient ascent

Model Inversion Attack Sensitive Information Disclosure nlp

defense arXiv Mar 9, 2026 · 10w ago

Qishun Yang, Shu Yang, Lijie Hu et al. · King Abdullah University of Science and Technology · China University of Petroleum-Beijing +1 more

Defends VLMs against visual jailbreaks via label-free fine-tuning on neutral threat-image tasks to shape safety-oriented personas

Prompt Injection visionmultimodalnlp

attack arXiv Apr 14, 2026 · 5w ago

Qi Li, Cheng-Long Wang, Yinzhi Cao et al. · King Abdullah University of Science and Technology · National University of Singapore +1 more

Membership inference attacks on subset-trained models revealing both training membership and selection participation across data pipelines

Membership Inference Attack visionnlp

defense arXiv Apr 21, 2026 · 4w ago

Jiaming Zhang, Meng Ding, Shaopeng Fu et al. · King Abdullah University of Science and Technology · Renmin University of China +2 more

Theoretical analysis proving Vision Transformers achieve benign overfitting under adversarial training with bounded perturbations

Input Manipulation Attack vision

Papers in Database (5)