Latest papers

6 papers
attack arXiv Jan 19, 2026 · 11w ago

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta et al. · INSAIT · Sofia University +3 more

Jailbreaks MLLMs via adversarial prompting to auto-generate misleading charts, reducing human and MLLM QA accuracy by ~20 points

Prompt Injection multimodalvisionnlp
PDF Code
defense arXiv Nov 26, 2025 · Nov 2025

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

Xiang Gu, Liming Lu, Xu Zheng et al. · Nanjing University of Science and Technology · The Hong Kong University of Science and Technology (Guangzhou) +3 more

Defends 3D point cloud models against adversarial attacks via multimodal teacher-student prompt distillation with zero inference overhead

Input Manipulation Attack visionmultimodal
PDF Code
attack arXiv Oct 28, 2025 · Oct 2025

SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning

Alexander Bakarsky, Dimitar I. Dimitrov, Maximilian Baader et al. · ETH Zürich · INSAIT +1 more

Scales gradient inversion attacks in federated learning to 10x larger batch sizes using sparse dictionary learning

Model Inversion Attack federated-learning
PDF
defense arXiv Oct 7, 2025 · Oct 2025

Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?

Qingyu Yin, Chak Tou Leong, Linyi Yang et al. · Zhejiang University · Xiaohongshu Inc. +6 more

Reveals mechanistic cause of safety alignment failure in reasoning LLMs and proposes data-efficient alignment repair via refusal cliff data selection

Prompt Injection nlp
2 citations PDF Code
defense arXiv Sep 25, 2025 · Sep 2025

FERD: Fairness-Enhanced Data-Free Robustness Distillation

Zhengxiao Li, Liming Lu, Xu Zheng et al. · Nanjing University of Science and Technology · HKUST(GZ) +3 more

Fairness-enhanced data-free distillation reduces per-class adversarial robustness disparity in student models via reweighted synthetic adversarial examples

Input Manipulation Attack vision
PDF
defense arXiv Sep 16, 2025 · Sep 2025

CIARD: Cyclic Iterative Adversarial Robustness Distillation

Liming Lu, Shuchao Pang, Xu Zheng et al. · Nanjing University of Science and Technology · HKUST(GZ) +4 more

Defends lightweight student models against adversarial attacks via cyclic multi-teacher distillation with contrastive alignment and continuous adversarial retraining

Input Manipulation Attack vision
PDF Code