Latest papers

44 papers
attack arXiv Apr 23, 2026 · 28d ago

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun et al. · Xi’an Jiaotong University

Style-based backdoor attack on LLMs using imperceptible triggers with auxiliary loss for stable payload injection across fine-tuning

Model Poisoning Training Data Poisoning nlp
PDF
benchmark arXiv Apr 17, 2026 · 4w ago

TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models

Chaoshuo Zhang, Yibo Liang, Mengke Tian et al. · Xi’an Jiaotong University · CISPA Helmholtz Center for Information Security

Benchmark evaluating compositional safety vulnerabilities in text-to-image models when benign concepts combine to create unsafe outputs

Input Manipulation Attack visiongenerative
PDF
attack arXiv Apr 17, 2026 · 4w ago

PoInit-of-View: Poisoning Initialization of Views Transfers Across Multiple 3D Reconstruction Systems

Weijie Wang, Songlong Xing, Zhengyu Zhao et al. · University of Trento · Fondazione Bruno Kessler +1 more

Adversarial attack poisoning input views to corrupt 3D reconstruction by targeting structure-from-motion initialization via cross-view gradient inconsistencies

Input Manipulation Attack vision
PDF
benchmark arXiv Apr 13, 2026 · 5w ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 13, 2026 · 5w ago

Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models

Songlong Xing, Weijie Wang, Zhengyu Zhao et al. · University of Trento · Fondazione Bruno Kessler +2 more

Adversarial finetuning for CLIP using web image-text pairs and contrastive learning to boost robustness across 14 domains

Input Manipulation Attack visionnlpmultimodal
PDF Code
defense arXiv Apr 6, 2026 · 6w ago

A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

Tianmeng Fang, Yong Wang, Zetai Kong et al. · Singapore Management University · China University of Mining and Technology +4 more

Defends vision-language models against backdoors using patch augmentation and cross-view regularization to break trigger invariance

Model Poisoning multimodalvisionnlp
PDF
tool arXiv Apr 5, 2026 · 6w ago

ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity

Hang Wang, Chao Shen, Lei Zhang et al. · Xi’an Jiaotong University · The Hong Kong Polytechnic University +1 more

Detects AI-generated videos by exploiting anomalous temporal self-similarity patterns across visual and semantic modalities

Output Integrity Attack visionmultimodal
PDF Code
defense arXiv Mar 20, 2026 · 8w ago

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang et al. · Northwest Institute of Nuclear Technology · Tsinghua University +1 more

Unifies adversarial robustness and LLM hallucination under a geometric uncertainty principle, proposing defenses without adversarial training

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
defense arXiv Mar 18, 2026 · 9w ago

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang et al. · Zhejiang University · Xi’an Jiaotong University +1 more

Black-box backdoor detector for speech models exploiting dual stability anomalies under semantic-breaking and semantic-preserving perturbations

Model Poisoning audio
PDF
attack arXiv Mar 8, 2026 · 10w ago

Hide and Find: A Distributed Adversarial Attack on Federated Graph Learning

Jinshan Liu, Ken Li, Jiazhe Wei et al. · Xi’an Jiaotong University

Proposes FedShift, a two-stage distributed attack combining covert data poisoning with efficient multi-client adversarial perturbation on federated graph learning

Input Manipulation Attack Data Poisoning Attack graphfederated-learning
PDF
defense arXiv Mar 2, 2026 · 11w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative
PDF
attack arXiv Mar 2, 2026 · 11w ago

Extracting Training Dialogue Data from Large Language Model based Task Bots

Shuo Zhang, Junzhou Zhao, Junji Hou et al. · Xi’an Jiaotong University

Extracts private training dialogue data from LLM task bots via novel response sampling and membership inference attack techniques

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
benchmark arXiv Jan 27, 2026 · Jan 2026

Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection

Yao Xiao, Weiyan Chen, Jiahao Chen et al. · Sun Yat-Sen University · Xi’an Jiaotong University +3 more

Introduces X-AIGD benchmark with pixel-level perceptual artifact annotations to enable interpretable AI-generated image detection evaluation

Output Integrity Attack vision
PDF Code
benchmark arXiv Jan 12, 2026 · Jan 2026

Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models

Weipeng Jiang, Xiaoyu Zhang, Juan Zhai et al. · Xi’an Jiaotong University · Nanyang Technological University +1 more

Discovers ASCII emoticons in prompts cause >38% semantic confusion in LLMs, producing syntactically valid but destructive silent failures in code generation

Prompt Injection nlp
PDF
benchmark arXiv Jan 9, 2026 · Jan 2026

The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence

Herun Wan, Jiaying Wu, Minnan Luo et al. · Xi’an Jiaotong University · National University of Singapore +1 more

Benchmarks LLM vulnerability to sophisticated fabricated evidence and proposes DIS defense to shield beliefs against indirect context manipulation

Prompt Injection nlp
PDF Code
defense USENIX Security Dec 17, 2025 · Dec 2025

From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning

Xiangrui Xu, Zhize Li, Yufei Han et al. · Beijing Jiaotong University · Singapore Management University +3 more

Theoretical framework quantifying data reconstruction attack risk in federated learning via Jacobian spectral analysis, with adaptive noise defenses

Model Inversion Attack federated-learningvision
1 citations PDF
defense arXiv Dec 8, 2025 · Dec 2025

Pay Less Attention to Function Words for Free Robustness of Vision-Language Models

Qiwei Tian, Chenhao Lin, Zhengyu Zhao et al. · Xi’an Jiaotong University

Defends VLMs against cross-modal adversarial attacks by suppressing attention to function words, cutting ASR by up to 90%

Input Manipulation Attack multimodalvisionnlp
PDF Code
benchmark arXiv Dec 6, 2025 · Dec 2025

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia, Jie Liao, Qi Guo et al. · Nanyang Technological University · BraneMatrix AI +7 more

Unified benchmark and toolbox evaluating 13 attack methods and 15 defenses against multimodal jailbreaks across 18 open- and closed-source MLLMs

Prompt Injection multimodalnlpvision
5 citations PDF Code
attack arXiv Dec 2, 2025 · Dec 2025

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong, Ziqi Miao, Lijun Li et al. · Shanghai Artificial Intelligence Laboratory · Xi’an Jiaotong University +1 more

Jailbreaks multimodal LLMs by embedding harmful queries in crafted visual contexts via a multi-agent image generation system

Prompt Injection visionmultimodalnlp
PDF
attack arXiv Nov 20, 2025 · Nov 2025

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang et al. · The Hong Kong University of Science and Technology · East China Normal University +5 more

Game-theoretic black-box jailbreak using Prisoner's Dilemma scenarios to flip LLM safety preferences, achieving 95%+ ASR on GPT-4o and DeepSeek-R1

Prompt Injection nlp
2 citations PDF Code
Loading more papers…