Latest papers

46 papers
attack arXiv Apr 2, 2026 · 4d ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp
PDF Code
attack arXiv Apr 1, 2026 · 5d ago

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

Hao Fang, Wenbo Yu, Bin Chen et al. · Tsinghua University · Harbin Institute of Technology

GAN-based gradient inversion attack reconstructing client training data from FL gradients via hierarchical feature optimization

Model Inversion Attack visionfederated-learning
PDF
defense arXiv Mar 31, 2026 · 6d ago

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Yubo Cui, Xianchao Guan, Zijun Xiong et al. · Harbin Institute of Technology · Shenzhen Loop Area Institute

Adversarial fine-tuning framework that preserves vision-language alignment while defending CLIP against adversarial perturbations in zero-shot settings

Input Manipulation Attack visionnlpmultimodal
PDF Code
defense arXiv Mar 25, 2026 · 12d ago

Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection

Jielun Peng, Yabin Wang, Yaqi Li et al. · Harbin Institute of Technology

Multimodal deepfake detector learning audio-visual coherence patterns to identify synthetic videos from commercial generators

Output Integrity Attack multimodalaudiovision
PDF Code
attack arXiv Mar 24, 2026 · 13d ago

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu et al. · Zhejiang University · Harbin Institute of Technology

Targeted adversarial patch attack hijacks VLA robotic control by corrupting CoT reasoning to induce specific malicious behaviors

Input Manipulation Attack multimodalvisionnlp
PDF
attack arXiv Mar 18, 2026 · 19d ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative
PDF
defense arXiv Mar 12, 2026 · 25d ago

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai, Zitong Yu, Jun Wang et al. · Great Bay University · Shenzhen University +2 more

Forensic-aware visual token pruning for deepfake/AIGC detection VLMs using Birth-Death Optimal Transport to preserve manipulation traces

Output Integrity Attack visionmultimodalnlp
PDF Code
attack arXiv Mar 2, 2026 · 5w ago

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Duoxun Tang, Dasen Dai, Jiyao Wang et al. · Tsinghua University · The Chinese University of Hong Kong +4 more

Universal sponge attack on Video-LLMs inflates token generation 205× and inference latency 15× via optimized adversarial video frame triggers

Input Manipulation Attack Model Denial of Service multimodalvisionnlp
PDF Code
defense arXiv Mar 2, 2026 · 5w ago

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Chao Chen, Yanhui Chen, Shanshan Lin et al. · Harbin Institute of Technology · Fuzhou University +1 more

Adversarial training framework combining explanation-guided constraints to improve robustness and saliency map stability against adversarial attacks

Input Manipulation Attack vision
PDF
tool arXiv Feb 21, 2026 · 6w ago

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Zhou Liu, Tonghua Su, Hongshi Zhang et al. · Harbin Institute of Technology · DZ-Matrix +3 more

Multimodal LLM system detects and localizes AI-generated image forgeries by fusing RGB and frequency-domain forensic features

Output Integrity Attack visionmultimodal
PDF
defense arXiv Feb 12, 2026 · 7w ago

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Zhaoxin Wang, Jiaming Liang, Fengbin Zhu et al. · Xidian University · National University of Singapore +1 more

Defends LLM safety alignment against neuron pruning attacks by redistributing safety representations across the network via selective neuron freezing

Prompt Injection nlpmultimodal
PDF
defense arXiv Feb 9, 2026 · 8w ago

NutVLM: A Self-Adaptive Defense Framework against Full-Dimension Attacks for Vision Language Models in Autonomous Driving

Xiaoxu Peng, Dong Zhou, Jianwen Zhang et al. · Harbin Institute of Technology · Nanyang Technological University

Defends VLMs against adversarial patches and global perturbations via three-way detection and gradient-based corrective prompt purification

Input Manipulation Attack Prompt Injection visionmultimodal
PDF Code
attack arXiv Feb 9, 2026 · 8w ago

Generating Adversarial Events: A Motion-Aware Point Cloud Framework

Hongwei Ren, Youxin Jiang, Qifei Gu et al. · Harbin Institute of Technology

Proposes gradient-based adversarial attack on event-camera DNNs via point cloud bridge, achieving 100% success rate with minimal perturbation

Input Manipulation Attack vision
PDF
defense arXiv Feb 9, 2026 · 8w ago

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Yanzhang Fu, Zizheng Guo, Jizhou Luo · Harbin Institute of Technology

Plug-and-play post-processing defense disrupts black-box score-based adversarial attacks by introducing loss ambiguity, surviving adaptive adversaries

Input Manipulation Attack vision
PDF
defense arXiv Feb 3, 2026 · 8w ago

Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective

Hao Fang, Tianyi Zhang, Tianqu Zhuang et al. · Tsinghua University · Harbin Institute of Technology

Defends proprietary LLMs from distillation-based theft by minimizing conditional mutual information in model logit outputs

Model Theft Model Theft nlp
PDF
defense arXiv Feb 1, 2026 · 9w ago

Who Transfers Safety? Identifying and Targeting Cross-Lingual Shared Safety Neurons

Xianhui Zhang, Chengyu Xie, Linxia Zhu et al. · Nanjing University of Science and Technology · National University of Singapore +2 more

Identifies sparse cross-lingual safety neurons in LLMs and proposes targeted fine-tuning to close multilingual jailbreak safety gaps

Prompt Injection nlp
PDF Code
benchmark arXiv Jan 25, 2026 · 10w ago

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Jiahe Guo, Xiangran Guo, Yulin Hu et al. · Harbin Institute of Technology · Ltd

Personalized LLM agent memory biases intent inference, causing 15–244% higher attack success rates on harmful queries than stateless baselines

Prompt Injection nlp
PDF
defense arXiv Jan 8, 2026 · 12w ago

Defense Against Indirect Prompt Injection via Tool Result Parsing

Qiang Yu, Xinran Cheng, Chuanyi Liu · Harbin Institute of Technology

Defends LLM agents from indirect prompt injection by parsing and filtering tool call results to strip adversarial payloads

Prompt Injection Insecure Plugin Design nlp
3 citations PDF Code
defense arXiv Jan 7, 2026 · 12w ago

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

Di Wu, Yanyan Zhao, Xin Lu et al. · Harbin Institute of Technology

Self-improving safety alignment trains LLMs to iteratively reason over safety rules to resist jailbreak attacks

Prompt Injection nlp
1 citations PDF Code
defense arXiv Jan 1, 2026 · Jan 2026

ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching

Yi Sun, Xinhao Zhong, Hongyan Li et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Training-free activation patching erases unsafe concepts from diffusion models, achieving SOTA safety with adversarial robustness

Output Integrity Attack visiongenerative
1 citations PDF
Loading more papers…