Latest papers

337 papers
defense arXiv Apr 1, 2026 · 5d ago

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun, Wanwei Liu, Haoang Chi et al. · National University of Defense Technology · Nanjing University +1 more

Interpretable DNN repair using Shapley-guided fault localization and derivative-free optimization for backdoor removal, adversarial defense, and fairness

Input Manipulation Attack Model Poisoning vision
PDF
survey arXiv Apr 1, 2026 · 5d ago

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar · SovereignAI Security Labs

Unified threat model for world model AI systems covering adversarial attacks, data poisoning, alignment risks, and cognitive security

Input Manipulation Attack Data Poisoning Attack Model Poisoning Prompt Injection Excessive Agency reinforcement-learningmultimodalvisionnlp
PDF
attack arXiv Apr 1, 2026 · 5d ago

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou et al. · Huazhong University of Science and Technology · Hubei University

Embeds latent trojans in individually safe LLMs that activate during model merging, bypassing safety alignment

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp
PDF
attack arXiv Apr 1, 2026 · 5d ago

Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

Swapnil Parekh · Intuit

Backdoor attack on tokenless reasoning models that hijacks continuous latent trajectories via single embedding perturbations, achieving 99%+ success while evading all token-level defenses

Model Poisoning Data Poisoning Attack nlp
PDF
attack arXiv Mar 31, 2026 · 6d ago

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi · Purdue University

Backdoor attack on federated learning using semantic triggers like sunglasses that evade robust aggregation defenses

Model Poisoning Data Poisoning Attack visionfederated-learning
PDF
attack arXiv Mar 30, 2026 · 7d ago

InkDrop: Invisible Backdoor Attacks Against Dataset Condensation

He Yang, Dongyi Lv, Song Ma et al. · Xi'an Jiaotong University · Tsinghua University

Stealthy backdoor attack on dataset condensation using boundary-proximate samples and imperceptible perturbations to evade detection

Model Poisoning vision
PDF Code
defense arXiv Mar 30, 2026 · 7d ago

Mitigating Backdoor Attacks in Federated Learning Using PPA and MiniMax Game Theory

Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab et al. · Polytechnique Montréal · Institut national de la recherche scientifique +2 more

Defends federated learning against backdoor attacks using reputation systems, game theory, and statistical analysis to reduce attack success to 1-11%

Model Poisoning Data Poisoning Attack visionfederated-learning
PDF
defense arXiv Mar 30, 2026 · 7d ago

FL-PBM: Pre-Training Backdoor Mitigation for Federated Learning

Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab et al. · Polytechnique Montréal · Khalifa University +2 more

Client-side defense that detects and blurs backdoored training data in federated learning using PCA and GMM clustering

Model Poisoning visionfederated-learning
PDF
attack arXiv Mar 29, 2026 · 8d ago

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang et al. · Hong Kong University of Science and Technology · The Chinese University of Hong Kong +2 more

Semantic backdoor attack on VLMs that injects ads when users ask recommendation questions about specific content categories

Model Poisoning multimodalvisionnlp
PDF
attack arXiv Mar 26, 2026 · 11d ago

On the Vulnerability of Deep Automatic Modulation Classifiers to Explainable Backdoor Threats

Younes Salmi, Hanna Bogucka · Poznan University of Technology

XAI-guided backdoor attack on wireless signal classifiers achieving high success with physically embedded triggers at low poisoning rates

Model Poisoning timeseries
PDF
attack arXiv Mar 26, 2026 · 11d ago

Physical Backdoor Attack Against Deep Learning-Based Modulation Classification

Younes Salmi, Hanna Bogucka · Poznan University of Technology

Physical backdoor attack using power amplifier distortions as triggers to compromise RF modulation classifiers at training time

Model Poisoning timeseries
PDF
survey arXiv Mar 25, 2026 · 12d ago

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan · University of Central Florida · University of Copenhagen

Unified taxonomy of ML security threats organizing attacks into data-to-data, data-to-model, model-to-data, and model-to-model categories

Input Manipulation Attack Data Poisoning Attack Model Inversion Attack Membership Inference Attack Model Theft Output Integrity Attack Model Poisoning Prompt Injection Sensitive Information Disclosure visionnlpmultimodal
PDF
defense arXiv Mar 24, 2026 · 13d ago

SafeSeek: Universal Attribution of Safety Circuits in Language Models

Miao Yu, Siyuan Fu, Moayad Aloqaily et al. · University of Science and Technology of China · Squirrel AI Learning +4 more

Mechanistic interpretability framework identifying sparse safety circuits in LLMs for backdoor removal and alignment preservation

Model Poisoning Input Manipulation Attack Prompt Injection nlp
PDF
attack arXiv Mar 24, 2026 · 13d ago

AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

Yutao Luo, Haotian Zhu, Shuchao Pang et al. · Nanjing University of Science and Technology · Macquarie University +3 more

Backdoor attack on mobile GUI agents using benign notification icons to trigger malicious actions with 90%+ success rate

Model Poisoning visionmultimodal
PDF
attack arXiv Mar 24, 2026 · 13d ago

PoiCGAN: A Targeted Poisoning Based on Feature-Label Joint Perturbation in Federated Learning

Tao Liu, Jiguang Lv, Dapeng Man et al. · Harbin Engineering University

Targeted federated learning poisoning attack using CGAN-based sample generation achieving 84% higher success than baselines while evading detection

Data Poisoning Attack Model Poisoning visionfederated-learning
PDF
attack arXiv Mar 20, 2026 · 17d ago

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu et al. · Shandong University

Text-only backdoor attack on graph neural networks that poisons node text while preserving graph structure, achieving near-perfect attack success rates

Model Poisoning Data Poisoning Attack nlpgraph
PDF
benchmark arXiv Mar 20, 2026 · 17d ago

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa et al. · KP Labs · Silesian University of Technology +4 more

Kaggle competition benchmark for detecting backdoor triggers in time series forecasting models for spacecraft telemetry

Model Poisoning timeseries
PDF Code
defense arXiv Mar 19, 2026 · 18d ago

Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning

Sheng Pan, Niansheng Tang · Yunnan University

Active auditing framework using stochastic probes to detect adaptive backdoors in decentralized federated learning networks

Model Poisoning federated-learning
PDF
defense arXiv Mar 18, 2026 · 19d ago

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang et al. · Zhejiang University · Xi’an Jiaotong University +1 more

Black-box backdoor detector for speech models exploiting dual stability anomalies under semantic-breaking and semantic-preserving perturbations

Model Poisoning audio
PDF
attack arXiv Mar 17, 2026 · 20d ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision
PDF
Loading more papers…