Latest papers

4 papers
defense arXiv Dec 15, 2025 · Dec 2025

Learning to Generate Cross-Task Unexploitable Examples

Haoxuan Qu, Qiuchi Xiang, Yujun Cai et al. · Lancaster University · The University of Queensland +2 more

Defends personal images from unauthorized ML training by generating cross-task imperceptible perturbations that make training data unlearnable across diverse vision tasks

Data Poisoning Attack vision
PDF
attack arXiv Nov 21, 2025 · Nov 2025

Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models

Zhiyuan Xu, Stanislav Abaimov, Joseph Gardiner et al. · University of Bristol

Novel activation-space attack exploits compression valleys in LLMs to steer behavior toward harmful outputs while evading conventional input/weight audits

Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Nov 15, 2025 · Nov 2025

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

Rui Wang, Zeming Wei, Xiyue Zhang et al. · Peking University · University of Bristol

Defends DNNs against unseen adversarial attacks by dynamically sampling attack types via multi-armed bandit adversarial training

Input Manipulation Attack vision
PDF
defense arXiv Aug 30, 2025 · Aug 2025

Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models

Sihao Wu, Gaojie Jin, Wei Huang et al. · University of Liverpool · University of Exeter +2 more

Defends VLMs against visual adversarial jailbreaks via adaptive activation steering vectors refined through sequence-level preference optimization

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF