Yongbin Zhou

defense arXiv Sep 24, 2025 · Sep 2025

SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models

Xiyu Zeng, Siyuan Liang, Liming Lu et al. · Nanjing University of Science and Technology · Nanyang Technological University +1 more

Inference-time SVD-based activation steering defends VLMs against visual jailbreaks while preserving utility and efficiency

Input Manipulation Attack Prompt Injection visionnlpmultimodal

1 citations PDF

defense arXiv Nov 26, 2025 · Nov 2025

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

Xiang Gu, Liming Lu, Xu Zheng et al. · Nanjing University of Science and Technology · The Hong Kong University of Science and Technology (Guangzhou) +3 more

Defends 3D point cloud models against adversarial attacks via multimodal teacher-student prompt distillation with zero inference overhead

Input Manipulation Attack visionmultimodal

PDF Code

defense arXiv Sep 25, 2025 · Sep 2025

FERD: Fairness-Enhanced Data-Free Robustness Distillation

Zhengxiao Li, Liming Lu, Xu Zheng et al. · Nanjing University of Science and Technology · HKUST(GZ) +3 more

Fairness-enhanced data-free distillation reduces per-class adversarial robustness disparity in student models via reweighted synthetic adversarial examples

Input Manipulation Attack vision

PDF

tool arXiv Dec 22, 2025 · Dec 2025

DREAM: Dynamic Red-teaming across Environments for AI Models

Liming Lu, Xiang Gu, Junyu Huang et al. · Nanjing University of Science and Technology · The University of Hong Kong +3 more

Automated red-teaming tool for LLM agents that chains 1,986 atomic attacks across 349 environments, achieving 70%+ bypass rates

Prompt Injection Excessive Agency nlp

PDF

Papers in Database (4)

SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

FERD: Fairness-Enhanced Data-Free Robustness Distillation

DREAM: Dynamic Red-teaming across Environments for AI Models