Latest papers

37 papers
defense arXiv Mar 20, 2026 · 17d ago

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang et al. · Northwest Institute of Nuclear Technology · Tsinghua University +1 more

Unifies adversarial robustness and LLM hallucination under a geometric uncertainty principle, proposing defenses without adversarial training

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
defense arXiv Mar 18, 2026 · 19d ago

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang et al. · Zhejiang University · Xi’an Jiaotong University +1 more

Black-box backdoor detector for speech models exploiting dual stability anomalies under semantic-breaking and semantic-preserving perturbations

Model Poisoning audio
PDF
attack arXiv Mar 8, 2026 · 29d ago

Hide and Find: A Distributed Adversarial Attack on Federated Graph Learning

Jinshan Liu, Ken Li, Jiazhe Wei et al. · Xi’an Jiaotong University

Proposes FedShift, a two-stage distributed attack combining covert data poisoning with efficient multi-client adversarial perturbation on federated graph learning

Input Manipulation Attack Data Poisoning Attack graphfederated-learning
PDF
attack arXiv Mar 2, 2026 · 5w ago

Extracting Training Dialogue Data from Large Language Model based Task Bots

Shuo Zhang, Junzhou Zhao, Junji Hou et al. · Xi’an Jiaotong University

Extracts private training dialogue data from LLM task bots via novel response sampling and membership inference attack techniques

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Mar 2, 2026 · 5w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative
PDF
benchmark arXiv Jan 27, 2026 · 9w ago

Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection

Yao Xiao, Weiyan Chen, Jiahao Chen et al. · Sun Yat-Sen University · Xi’an Jiaotong University +3 more

Introduces X-AIGD benchmark with pixel-level perceptual artifact annotations to enable interpretable AI-generated image detection evaluation

Output Integrity Attack vision
PDF Code
benchmark arXiv Jan 12, 2026 · 12w ago

Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models

Weipeng Jiang, Xiaoyu Zhang, Juan Zhai et al. · Xi’an Jiaotong University · Nanyang Technological University +1 more

Discovers ASCII emoticons in prompts cause >38% semantic confusion in LLMs, producing syntactically valid but destructive silent failures in code generation

Prompt Injection nlp
PDF
benchmark arXiv Jan 9, 2026 · 12w ago

The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence

Herun Wan, Jiaying Wu, Minnan Luo et al. · Xi’an Jiaotong University · National University of Singapore +1 more

Benchmarks LLM vulnerability to sophisticated fabricated evidence and proposes DIS defense to shield beliefs against indirect context manipulation

Prompt Injection nlp
PDF Code
defense USENIX Security Dec 17, 2025 · Dec 2025

From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning

Xiangrui Xu, Zhize Li, Yufei Han et al. · Beijing Jiaotong University · Singapore Management University +3 more

Theoretical framework quantifying data reconstruction attack risk in federated learning via Jacobian spectral analysis, with adaptive noise defenses

Model Inversion Attack federated-learningvision
1 citations PDF
defense arXiv Dec 8, 2025 · Dec 2025

Pay Less Attention to Function Words for Free Robustness of Vision-Language Models

Qiwei Tian, Chenhao Lin, Zhengyu Zhao et al. · Xi’an Jiaotong University

Defends VLMs against cross-modal adversarial attacks by suppressing attention to function words, cutting ASR by up to 90%

Input Manipulation Attack multimodalvisionnlp
PDF Code
benchmark arXiv Dec 6, 2025 · Dec 2025

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia, Jie Liao, Qi Guo et al. · Nanyang Technological University · BraneMatrix AI +7 more

Unified benchmark and toolbox evaluating 13 attack methods and 15 defenses against multimodal jailbreaks across 18 open- and closed-source MLLMs

Prompt Injection multimodalnlpvision
5 citations PDF Code
attack arXiv Dec 2, 2025 · Dec 2025

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong, Ziqi Miao, Lijun Li et al. · Shanghai Artificial Intelligence Laboratory · Xi’an Jiaotong University +1 more

Jailbreaks multimodal LLMs by embedding harmful queries in crafted visual contexts via a multi-agent image generation system

Prompt Injection visionmultimodalnlp
PDF
attack arXiv Nov 20, 2025 · Nov 2025

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang et al. · The Hong Kong University of Science and Technology · East China Normal University +5 more

Game-theoretic black-box jailbreak using Prisoner's Dilemma scenarios to flip LLM safety preferences, achieving 95%+ ASR on GPT-4o and DeepSeek-R1

Prompt Injection nlp
2 citations PDF Code
attack arXiv Nov 19, 2025 · Nov 2025

What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs

Zhihan Ren, Lijun He, Jiaxi Liang et al. · Xi’an Jiaotong University

Black-box feature inversion attack on split DNNs reconstructs private inputs from intermediate features with high fidelity using flow matching

Model Inversion Attack vision
1 citations PDF
attack arXiv Nov 17, 2025 · Nov 2025

T2I-Based Physical-World Appearance Attack against Traffic Sign Recognition Systems in Autonomous Driving

Chen Ma, Ningfei Wang, Junhao Zheng et al. · Xi’an Jiaotong University · University of California +2 more

T2I diffusion-based physical adversarial appearance attack fools traffic sign classifiers with 83.3% real-world success rate

Input Manipulation Attack vision
PDF
defense arXiv Nov 11, 2025 · Nov 2025

Multi-modal Deepfake Detection and Localization with FPN-Transformer

Chende Zheng, Ruiqi Suo, Zhoulin Ji et al. · Xi’an Jiaotong University

Novel FPN-Transformer framework detects and localizes deepfake forgeries across audio-visual modalities with temporal boundary regression

Output Integrity Attack multimodalaudiovision
PDF Code
attack arXiv Nov 11, 2025 · Nov 2025

MSCR: Exploring the Vulnerability of LLMs' Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement

Zhishen Sun, Guang Dai, Haishan Ye · Xi’an Jiaotong University · SGIT AI Lab

Word-substitution attack exposes LLM math reasoning fragility, dropping accuracy by up to 50% on GSM8K

Prompt Injection nlp
PDF
defense arXiv Nov 10, 2025 · Nov 2025

A Theoretical Analysis of Detecting Large Model-Generated Time Series

Junji Hou, Junzhou Zhao, Shuo Zhang et al. · Xi’an Jiaotong University

Proves a contraction hypothesis and proposes UCE to detect AI-generated time series via recursive forecasting uncertainty

Output Integrity Attack timeseries
2 citations PDF
defense arXiv Nov 10, 2025 · Nov 2025

Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data

Tianle Song, Chenhao Lin, Yang Cao et al. · Xi’an Jiaotong University · Institute of Science Tokyo

Defends mobile sensor privacy by predictively generating adversarial perturbations that fool ML attribute-inference models in real time

Input Manipulation Attack timeseries
PDF
attack arXiv Nov 7, 2025 · Nov 2025

Learning Fourier shapes to probe the geometric world of deep neural networks

Jian Wang, Yixing Yong, Haixia Bi et al. · Xi’an Jiaotong University

Differentiable Fourier shape optimization creates geometry-only adversarial inputs that fool vision classifiers without texture perturbations

Input Manipulation Attack vision
PDF
Loading more papers…