Latest papers

12 papers
defense arXiv Mar 1, 2026 · 5w ago

S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights

Gaojie Jin, Xinping Yi, Wei Huang et al. · University of Exeter · Southeast University +1 more

Improves adversarial training robustness by optimizing second-order weight statistics via a tightened PAC-Bayesian bound

Input Manipulation Attack vision
PDF Code
defense IEEE Transactions on Image Pro... Jan 23, 2026 · 10w ago

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Qinkai Yu, Chong Zhang, Gaojie Jin et al. · University of Exeter · King Abdullah University of Science and Technology +6 more

Embeds backdoor-based watermarks in medical segmentation models to verify ownership under black-box API conditions

Model Theft vision
PDF Code
attack arXiv Jan 18, 2026 · 11w ago

DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing

Jinwei Hu, Shiyuan Meng, Yi Dong et al. · University of Liverpool · Shanghai Artificial Intelligence Laboratory

Efficient adversarial attack using XAI-guided spatial targeting and temporal frame selection to reduce per-frame robustness testing overhead

Input Manipulation Attack vision
PDF
attack arXiv Jan 4, 2026 · Jan 2026

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Jinwei Hu, Xinmiao Huang, Youcheng Sun et al. · University of Liverpool · Mohamed bin Zayed University of Artificial Intelligence

Colluding LLM agents manipulate victim agents into false beliefs by coordinating truthful but deceptive evidence fragments across public channels

Prompt Injection nlp
PDF Code
defense arXiv Nov 27, 2025 · Nov 2025

Rethinking Cross-Generator Image Forgery Detection through DINOv3

Zhenglin Huang, Jason Li, Haiquan Wen et al. · University of Liverpool · Nanyang Technological University +3 more

Discovers frozen DINOv3 detects cross-generator image forgeries via low-frequency cues; proposes training-free token-ranking baseline

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Nov 13, 2025 · Nov 2025

Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation

Zhen Chen, Yi Zhang, Xiangyu Yin et al. · University of Liverpool · University of Warwick

Evaluation framework shows anti-DreamBooth adversarial image protections are trivially defeated by purification, enabling facial identity leakage

Output Integrity Attack visiongenerative
PDF Code
benchmark arXiv Nov 3, 2025 · Nov 2025

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

Yi Zhang, Zheng Wang, Zhen Chen et al. · University of Warwick · University of Liverpool +2 more

Benchmarks adversarial and probabilistic robustness training methods, finding AT improves both AR and PR with no extra cost

Input Manipulation Attack vision
1 citations PDF Code
tool arXiv Oct 21, 2025 · Oct 2025

Robustness Verification of Graph Neural Networks Via Lightweight Satisfiability Testing

Chia-Hsuan Lu, Tony Tan, Michael Benedikt · arXiv · University of Oxford +1 more

Verifies GNN robustness against structural adversarial perturbations using polynomial-time partial SAT solvers instead of MIP

Input Manipulation Attack graph
1 citations PDF Code
defense arXiv Sep 30, 2025 · Sep 2025

Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier

Gaojie Jin, Xinping Yi, Xiaowei Huang · University of Exeter · Southeast University +1 more

Derives PAC-Bayesian certified robustness bounds for smoothed majority vote classifiers and proposes spectral regularization to improve robustness-accuracy tradeoff

Input Manipulation Attack vision
1 citations PDF
defense arXiv Aug 30, 2025 · Aug 2025

Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models

Sihao Wu, Gaojie Jin, Wei Huang et al. · University of Liverpool · University of Exeter +2 more

Defends VLMs against visual adversarial jailbreaks via adaptive activation steering vectors refined through sequence-level preference optimization

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
attack arXiv Aug 23, 2025 · Aug 2025

POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization

Xinyu Li, Tianjin Huang, Ronghui Mu et al. · University of Exeter · University of Liverpool

Black-box adversarial prompts exploit CoT reasoning to inflate LLM token generation and exhaust compute resources

Model Denial of Service nlp
PDF
defense arXiv Aug 6, 2025 · Aug 2025

RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake Detection

Tianxiao Li, Zhenglin Huang, Haiquan Wen et al. · University of Liverpool · Beihang University +1 more

Novel explainable deepfake detector combining retrieval-augmented generation and GRPO RL to produce saliency maps and textual rationales

Output Integrity Attack visionmultimodal
PDF