Latest papers

5 papers
attack arXiv Apr 2, 2026 · 4d ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp
PDF Code
defense arXiv Dec 20, 2025 · Dec 2025

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

Yucheng Fan, Jiawei Chen, Yu Tian et al. · East China Normal University · Zhongguancun Academy +1 more

Adversarial image perturbations shield social-media photos from VLM-based private attribute inference while preserving visual quality

Input Manipulation Attack visionmultimodal
PDF
defense arXiv Nov 9, 2025 · Nov 2025

KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs

Shuyuan Liu, Jiawei Chen, Xiao Yang et al. · East China Normal University · Zhongguancun Academy +1 more

Knowledge graph-based black-box defense that detects jailbreak intent via semantic parsing without accessing LLM internals

Prompt Injection nlp
PDF
defense arXiv Aug 18, 2025 · Aug 2025

RAJ-PGA: Reasoning-Activated Jailbreak and Principle-Guided Alignment Framework for Large Reasoning Models

Jianhao Chen, Mayi Xu, Haoyang Chen et al. · Wuhan University · Zhongguancun Academy +2 more

Jailbreaks Large Reasoning Models via prompt concretization targeting CoT reasoning, then builds a safety alignment dataset that improves defense by 29.5%

Prompt Injection nlp
PDF Code
attack arXiv Aug 12, 2025 · Aug 2025

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Shixuan Sun, Siyuan Liang, Ruoyu Chen et al. · Sun Yat-Sen University · University of Chinese Academy of Sciences +3 more

Source-aware membership inference audit for RAG/MRAG systems attributing outputs to training data, retrieval, or user input via zero-order optimization

Membership Inference Attack Sensitive Information Disclosure nlpmultimodal
PDF