Latest papers

28 papers
tool arXiv Mar 23, 2026 · 16d ago

FeatDistill: A Feature Distillation Enhanced Multi-Expert Ensemble Framework for Robust AI-generated Image Detection

Zhilin Tu, Kemou Li, Fengpeng Li et al. · University of Electronic Science and Technology of China · University of Macau +2 more

Multi-expert ensemble detector for AI-generated images robust to degradations, using CLIP/SigLIP transformers with feature distillation

Output Integrity Attack visiongenerative
PDF
attack arXiv Mar 23, 2026 · 16d ago

Thermal Topology Collapse: Universal Physical Patch Attacks on Infrared Vision Systems

Chengyin Hu, Yikun Guo, Yuxian Dong et al. · China University of Petroleum-Beijing · University of Electronic Science and Technology of China +3 more

Universal adversarial patch attack on infrared pedestrian detectors using parameterized Bézier curves and cold patches

Input Manipulation Attack vision
PDF
defense arXiv Mar 19, 2026 · 20d ago

Functional Subspace Watermarking for Large Language Models

Zikang Ding, Junhao Li, Suling Wu et al. · University of Electronic Science and Technology of China · Mohamed bin Zayed University of Artificial Intelligence +1 more

Embeds ownership watermarks in a low-dimensional functional subspace of LLM weights, surviving fine-tuning, quantization, and distillation attacks

Model Theft Model Theft nlp
PDF
attack arXiv Mar 12, 2026 · 27d ago

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

Qizhi Chen, Chao Qi, Yihong Huang et al. · University of Electronic Science and Technology of China

Poisons GraphRAG knowledge graphs by forging fake knowledge evolution paths to hijack LLM query responses

Data Poisoning Attack Prompt Injection nlpgraph
PDF
attack arXiv Mar 12, 2026 · 27d ago

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Zikang Ding, Haomiao Yang, Meng Hao et al. · University of Electronic Science and Technology of China · Singapore Management University +2 more

Proposes temporally-delayed backdoor attacks on NLP pre-trained models using common everyday words as stealthy triggers

Model Poisoning nlp
PDF
attack arXiv Mar 4, 2026 · 5w ago

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Junchen Li, Chao Qi, Rongzheng Wang et al. · University of Electronic Science and Technology of China · Fudan University +1 more

Poisons RAG knowledge bases with alignment-exploiting documents that transfer blocking attacks across 7 LLMs with 96% success

Data Poisoning Attack Prompt Injection nlp
PDF
defense arXiv Feb 24, 2026 · 6w ago

Robust Spiking Neural Networks Against Adversarial Attacks

Shuai Wang, Malu Zhang, Yulin Jiang et al. · University of Electronic Science and Technology of China · National University of Singapore +2 more

Defends Spiking Neural Networks against adversarial attacks by pushing membrane potentials away from firing thresholds and adding probabilistic noise

Input Manipulation Attack vision
PDF
defense arXiv Feb 7, 2026 · 8w ago

UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding

Jiaming He, Fuming Luo, Hongwei Li et al. · University of Electronic Science and Technology of China · Independent Researcher +2 more

Protects private tabular data from unauthorized training by injecting decoupled shortcut perturbations that drive models to near-random performance

Data Poisoning Attack tabular
PDF
defense arXiv Feb 3, 2026 · 9w ago

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Mengxuan Wang, Yuxin Chen, Gang Xu et al. · South China University of Technology · Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) +2 more

Training-free VLM defense that amplifies risk signals in visual tokens to block multimodal jailbreak attacks without utility loss

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
attack arXiv Feb 3, 2026 · 9w ago

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

Tianxin Chen, Wenbo Jiang, Hongqiao Chen et al. · Fudan University · University of Electronic Science and Technology of China +1 more

Backdoor attack on T2I diffusion models using semantic-space triggers that evade enumeration and attention-consistency defenses with 100% ASR

Model Poisoning visionnlpgenerativemultimodal
PDF
defense TDSC Jan 17, 2026 · 11w ago

Decoder Gradient Shields: A Family of Provable and High-Fidelity Methods Against Gradient-Based Box-Free Watermark Removal

Haonan An, Guang Hua, Wei Du et al. · City University of Hong Kong · Singapore Institute of Technology +3 more

Defends box-free model watermarks in generative model outputs against gradient-leakage-based removal attacks using provable gradient-manipulation shields

Output Integrity Attack visiongenerative
1 citations PDF
attack arXiv Jan 7, 2026 · Jan 2026

State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Ji Guo, Wenbo Jiang, Yansong Lin et al. · University of Electronic Science and Technology of China · Nanyang Technological University +1 more

Backdoor attack on VLA robotics models using robot arm initial state as trigger, achieving >90% attack success rate stealthily

Model Poisoning Data Poisoning Attack visionmultimodal
1 citations PDF
defense arXiv Jan 5, 2026 · Jan 2026

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks

Chenyu Hu, Qiming Hu, Sinan Chen et al. · Southwest University · University of Electronic Science and Technology of China +3 more

Defends federated learning against adaptive backdoor attacks using dynamic gradient scaling and robust core-set aggregation

Model Poisoning federated-learningvision
PDF
attack arXiv Dec 3, 2025 · Dec 2025

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Haidong Kang, Wei Wu, Hanling Wang · Northeastern University · University of Electronic Science and Technology of China +1 more

Uses LLMs with PPO reinforcement learning to auto-discover adversarial attacks that outperform PGD/FGSM against few-shot class-incremental learning systems

Input Manipulation Attack visionnlp
PDF
attack arXiv Nov 26, 2025 · Nov 2025

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

Jiaming He, Guanyu Hou, Hongwei Li et al. · University of Electronic Science and Technology of China · University of Manchester +3 more

Automated red-teaming framework crafts temporally-aware prompts to jailbreak T2V model safety filters, achieving 80%+ attack success rate

Prompt Injection visionnlpgenerativemultimodal
PDF
attack arXiv Nov 25, 2025 · Nov 2025

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

Changyue Li, Jiaying Li, Youliang Yuan et al. · The Chinese University of Hong Kong · University of Electronic Science and Technology of China +1 more

Universal adversarial image perturbation semantically routes MLLM inputs to multiple distinct attacker-defined targets simultaneously

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF
tool arXiv Nov 18, 2025 · Nov 2025

ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

Zitong Xu, Huiyu Duan, Xiaoyu Wang et al. · Shanghai Jiao Tong University · University of Electronic Science and Technology of China +1 more

Proposes ManipBench (450K AI-edited images, 25 models) and MLLM-based ManipShield for unified manipulation detection, localization, and explanation

Output Integrity Attack visionmultimodal
PDF
attack TrustCom Nov 17, 2025 · Nov 2025

ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models

Siyang Cheng, Gaotian Liu, Rui Mei et al. · iFLYTEK · Anhui SparkShield Intelligent Technology +5 more

Evolutionary jailbreak framework using multi-level text perturbations and semantic fitness to bypass LLM alignment at high success rates

Prompt Injection nlp
PDF
defense International Journal of Compu... Nov 14, 2025 · Nov 2025

Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm

Fuxiang Huang, Xiaowei Fu, Shiyu Ye et al. · Chongqing University · Lingnan University +3 more

Defends unsupervised domain adaptation models against adversarial attacks via disentangled distillation post-training

Input Manipulation Attack vision
PDF
attack arXiv Nov 12, 2025 · Nov 2025

Boosting Adversarial Transferability via Ensemble Non-Attention

Yipeng Zou, Qin Liu, Jie Wu et al. · Hunan University · China Telecom +2 more

Ensemble adversarial attack leveraging non-attention regions and meta-learning to boost black-box transferability across CNNs and ViTs

Input Manipulation Attack vision
PDF
Loading more papers…