Gaojie Jin

Papers in Database (3)

defense arXiv Mar 1, 2026 · 5w ago

S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights

Gaojie Jin, Xinping Yi, Wei Huang et al. · University of Exeter · Southeast University +1 more

Improves adversarial training robustness by optimizing second-order weight statistics via a tightened PAC-Bayesian bound

Input Manipulation Attack vision
PDF Code
defense arXiv Aug 30, 2025 · Aug 2025

Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models

Sihao Wu, Gaojie Jin, Wei Huang et al. · University of Liverpool · University of Exeter +2 more

Defends VLMs against visual adversarial jailbreaks via adaptive activation steering vectors refined through sequence-level preference optimization

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
attack arXiv Aug 23, 2025 · Aug 2025

POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization

Xinyu Li, Tianjin Huang, Ronghui Mu et al. · University of Exeter · University of Liverpool

Black-box adversarial prompts exploit CoT reasoning to inflate LLM token generation and exhaust compute resources

Model Denial of Service nlp
PDF