Latest papers

23 papers
attack arXiv Mar 29, 2026 · 10d ago

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang et al. · Hong Kong University of Science and Technology · The Chinese University of Hong Kong +2 more

Semantic backdoor attack on VLMs that injects ads when users ask recommendation questions about specific content categories

Model Poisoning multimodalvisionnlp
PDF
attack arXiv Mar 25, 2026 · 14d ago

How Vulnerable Are Edge LLMs?

Ao Ding, Hongzong Li, Zi Liang et al. · China University of Geosciences · Hong Kong University of Science and Technology +4 more

Query-based extraction attack on quantized edge LLMs using clustered instruction queries to steal model behavior efficiently

Model Theft Model Theft nlp
PDF
attack arXiv Feb 5, 2026 · 8w ago

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Yao Zhou, Zeen Song, Wenwen Qiang et al. · Institute of Software Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Causal front-door adjustment framework strips LLM safety features via Sparse Autoencoders to achieve state-of-the-art jailbreak success rates

Prompt Injection nlp
PDF
defense arXiv Jan 17, 2026 · 11w ago

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework

Zimo Ji, Daoyuan Wu, Wenyuan Jiang et al. · Hong Kong University of Science and Technology · Lingnan University +3 more

Proposes SEAgent, a mandatory access control framework that blocks privilege escalation attacks in LLM agent tool use via information flow monitoring and ABAC policies

Prompt Injection Excessive Agency nlp
1 citations PDF
defense TPAMI Jan 17, 2026 · 11w ago

A Unified Masked Jigsaw Puzzle Framework for Vision and Language Models

Weixin Ye, Wei Wang, Yahui Liu et al. · Beijing Jiaotong University · Kuaishou +4 more

Defends against gradient inversion in federated Transformers by shuffling tokens and masking position embeddings

Model Inversion Attack visionnlpfederated-learning
PDF Code
defense arXiv Jan 8, 2026 · Jan 2026

AM$^3$Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs

Han Zhu, Jiale Chen, Chengkun Cai et al. · Hong Kong University of Science and Technology · Sun Yat-Sen University +3 more

GRPO-based safety alignment framework defending MLLMs against multi-turn jailbreaks via dataset and turn-aware dual-objective rewards

Prompt Injection multimodalnlp
PDF
attack arXiv Jan 2, 2026 · Jan 2026

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

Yueyan Dong, Minghui Xu, Qin Hu et al. · Shandong University · Guangdong University of Finance and Economics +2 more

Exploits LoRA's decoupled A/B matrix aggregation in federated LLM fine-tuning to inject stealthy malicious updates that degrade model quality while evading anomaly detectors

Data Poisoning Attack Transfer Learning Attack nlpfederated-learning
PDF
attack arXiv Dec 30, 2025 · Dec 2025

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

Ruixuan Huang, Qingyue Wang, Hantao Huang et al. · Hong Kong University of Science and Technology · Nanyang Technological University

Black-box DoS attack exploits MoE router imbalance via repetitive token patterns, causing 3x latency spike on Mixtral-8x7B

Model Denial of Service nlp
PDF
attack arXiv Dec 26, 2025 · Dec 2025

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao et al. · Hong Kong University of Science and Technology · Nanjing University of Aeronautics and Astronautics +2 more

Proposes BadVSFM, a two-stage backdoor attack on prompt-driven video segmentation models where classic backdoors fail (<5% ASR)

Model Poisoning vision
PDF
defense arXiv Dec 7, 2025 · Dec 2025

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Ruoxin Chen, Jiahui Gao, Kaiqing Lin et al. · Tencent · East China University of Science and Technology +2 more

Proposes task-model alignment combining VLMs and vision models for generalizable AI-generated image detection

Output Integrity Attack visionmultimodal
PDF
defense arXiv Dec 5, 2025 · Dec 2025

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

Weikai Lu, Ziqian Zeng, Kehua Zhang et al. · South China University of Technology · Hong Kong University of Science and Technology +2 more

Defends MLLMs against multimodal indirect prompt injection by steering instruction-following behavior in activation space

Prompt Injection multimodalnlp
1 citations PDF
defense arXiv Nov 27, 2025 · Nov 2025

RemedyGS: Defend 3D Gaussian Splatting against Computation Cost Attacks

Yanping Li, Zhening Liu, Zijian Li et al. · Hong Kong University of Science and Technology

Defends 3D Gaussian Splatting reconstruction against adversarial texture-poisoning DoS attacks via an ML-based detect-and-purify pipeline

Input Manipulation Attack vision
1 citations PDF
attack arXiv Nov 17, 2025 · Nov 2025

VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language

Zonghao Ying, Moyang Chen, Nizhang Li et al. · Beihang University · Wenzhou-Kean University +4 more

Jailbreaks text-to-video models using benign prompts with auditory triggers and cinematic cues that exploit cross-modal priors

Prompt Injection multimodalgenerativevisionnlp
1 citations PDF Code
benchmark arXiv Nov 13, 2025 · Nov 2025

Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks

Xuancun Lu, Jiaxiang Chen, Shilin Xiao et al. · Zhejiang University · Hong Kong University of Science and Technology

Benchmarks physical sensor attacks (laser, EMI, ultrasound) against VLA robotic models and defends with adversarial training

Input Manipulation Attack multimodalvisionaudio
PDF Code
defense arXiv Nov 11, 2025 · Nov 2025

3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial Intelligence

Eren Kurshan, Yuan Xie, Paul Franzon · Princeton University · Hong Kong University of Science and Technology +1 more

Proposes 3D-integrated hardware safety layer for edge AI systems that dynamically detects and mitigates inference-time network attacks

Input Manipulation Attack Excessive Agency visionnlp
PDF
benchmark arXiv Nov 10, 2025 · Nov 2025

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang, Mingzi Zhang, Xuanyu Yin et al. · Zhejiang University of Technology · Hong Kong University of Science and Technology +3 more

Benchmark evaluating teacher-persona jailbreaks on LLMs, revealing a scaling paradox where mid-sized models are most vulnerable

Prompt Injection nlp
PDF Code
attack arXiv Oct 21, 2025 · Oct 2025

FeatureFool: Zero-Query Fooling of Video Models via Feature Map

Duoxun Tang, Xi Xiao, Guangwu Hu et al. · Tsinghua University · Shenzhen University of Information Technology +4 more

Zero-query black-box adversarial video attack using guided backpropagation feature maps to fool classifiers and bypass Video-LLM harmful content detection

Input Manipulation Attack Prompt Injection visionmultimodal
1 citations PDF
benchmark arXiv Oct 14, 2025 · Oct 2025

SafeMT: Multi-turn Safety for Multimodal Language Models

Han Zhu, Juntao Dai, Jiaming Ji et al. · Hong Kong University of Science and Technology · Peking University +1 more

Benchmarks multi-turn jailbreak safety of 17 multimodal LLMs and proposes a dialogue safety moderator to reduce attack success rates

Prompt Injection multimodalnlp
3 citations PDF
defense arXiv Oct 9, 2025 · Oct 2025

Provably Robust Adaptation for Language-Empowered Foundation Models

Yuni Lai, Xiaoyu Xue, Linghui Shen et al. · The Hong Kong Polytechnic University · National University of Defense Technology +2 more

Certifiably robust few-shot classifier for CLIP/GraphCLIP using trimmed-mean prototypes and randomized smoothing against support-set poisoning

Data Poisoning Attack visiongraphmultimodal
1 citations PDF
benchmark arXiv Oct 8, 2025 · Oct 2025

Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent

Weidi Luo, Qiming Zhang, Tianyu Lu et al. · University of Georgia · University of Wisconsin–Madison +6 more

Benchmarks LLM-powered agents' ability to execute end-to-end enterprise intrusions aligned with MITRE ATT&CK TTPs

Excessive Agency Prompt Injection nlpmultimodal
4 citations PDF Code
Loading more papers…