Latest papers

11 papers
defense arXiv Mar 1, 2026 · 5w ago

S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights

Gaojie Jin, Xinping Yi, Wei Huang et al. · University of Exeter · Southeast University +1 more

Improves adversarial training robustness by optimizing second-order weight statistics via a tightened PAC-Bayesian bound

Input Manipulation Attack vision
PDF Code
attack arXiv Feb 27, 2026 · 5w ago

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

He Li, Wenyue He, Weihang Kong et al. · Yanshan University · University of Exeter

Black-box adversarial patch attack jointly optimizes position and color to fool visual-infrared multimodal dense prediction models

Input Manipulation Attack visionmultimodal
PDF
defense IEEE Transactions on Image Pro... Jan 23, 2026 · 10w ago

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Qinkai Yu, Chong Zhang, Gaojie Jin et al. · University of Exeter · King Abdullah University of Science and Technology +6 more

Embeds backdoor-based watermarks in medical segmentation models to verify ownership under black-box API conditions

Model Theft vision
PDF Code
defense arXiv Dec 1, 2025 · Dec 2025

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing

Zixia Wang, Gaojie Jin, Jia Hu et al. · University of Exeter

Certifies LLM robustness against synonym substitution attacks via clustering-guided denoising smoothing with tighter bounds

Input Manipulation Attack Prompt Injection nlp
PDF
attack arXiv Nov 13, 2025 · Nov 2025

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu, Renjue Li, Lijia Yu et al. · Southwest University · Chinese Academy of Sciences +1 more

Backdoor attack poisons LLM fine-tuning to trigger 17x CoT trace inflation for stealthy compute exhaustion

Model Poisoning Model Denial of Service nlp
1 citations PDF
attack arXiv Nov 1, 2025 · Nov 2025

Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling

Zenghao Niu, Weicheng Xie, Siyang Song et al. · Shenzhen University · Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) +3 more

Gradient-guided sampling attack improves adversarial transferability across DNNs and VLMs by balancing loss flatness and attack potency

Input Manipulation Attack Prompt Injection visionmultimodal
PDF Code
defense arXiv Sep 30, 2025 · Sep 2025

Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier

Gaojie Jin, Xinping Yi, Xiaowei Huang · University of Exeter · Southeast University +1 more

Derives PAC-Bayesian certified robustness bounds for smoothed majority vote classifiers and proposes spectral regularization to improve robustness-accuracy tradeoff

Input Manipulation Attack vision
1 citations PDF
attack arXiv Sep 27, 2025 · Sep 2025

Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models

Xiaotian Zou · University of Exeter

RL-based attack steals proprietary text-to-image prompt templates from example images at 87% lower cost than prior methods

Model Theft visiongenerative
PDF
defense arXiv Aug 30, 2025 · Aug 2025

Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models

Sihao Wu, Gaojie Jin, Wei Huang et al. · University of Liverpool · University of Exeter +2 more

Defends VLMs against visual adversarial jailbreaks via adaptive activation steering vectors refined through sequence-level preference optimization

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
attack arXiv Aug 23, 2025 · Aug 2025

POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization

Xinyu Li, Tianjin Huang, Ronghui Mu et al. · University of Exeter · University of Liverpool

Black-box adversarial prompts exploit CoT reasoning to inflate LLM token generation and exhaust compute resources

Model Denial of Service nlp
PDF
survey arXiv Aug 7, 2025 · Aug 2025

Safety of Embodied Navigation: A Survey

Zixia Wang, Jia Hu, Ronghui Mu · University of Exeter

Surveys attack strategies, defenses, and evaluation methods for safety of LLM-powered embodied navigation agents

Input Manipulation Attack Model Poisoning Prompt Injection Excessive Agency multimodalreinforcement-learningnlp
PDF