Latest papers

5 papers
attack arXiv Feb 3, 2026 · 8w ago

Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks

Yi Yu, Qixin Zhang, Shuhan Ye et al. · Nanyang Technological University · Chinese University of Hong Kong +2 more

Gradient-based timing-only adversarial attack on event-driven SNNs retimes spikes to cause misclassification while preserving spike counts

Input Manipulation Attack vision
2 citations PDF Code
defense International Journal of Compu... Nov 14, 2025 · Nov 2025

Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm

Fuxiang Huang, Xiaowei Fu, Shiyu Ye et al. · Chongqing University · Lingnan University +3 more

Defends unsupervised domain adaptation models against adversarial attacks via disentangled distillation post-training

Input Manipulation Attack vision
PDF
defense arXiv Oct 9, 2025 · Oct 2025

MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation

Weisen Jiang, Sinno Jialin Pan · Chinese University of Hong Kong

Two-stage LLM guardrail defends against finetuning-based jailbreaks by detecting harmful queries before and during generation

Transfer Learning Attack Prompt Injection nlp
2 citations 1 influentialPDF Code
benchmark arXiv Oct 8, 2025 · Oct 2025

Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent

Weidi Luo, Qiming Zhang, Tianyu Lu et al. · University of Georgia · University of Wisconsin–Madison +6 more

Benchmarks LLM-powered agents' ability to execute end-to-end enterprise intrusions aligned with MITRE ATT&CK TTPs

Excessive Agency Prompt Injection nlpmultimodal
4 citations PDF Code
defense arXiv Sep 29, 2025 · Sep 2025

DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

Zherui Li, Zheng Nie, Zhenhong Zhou et al. · Beijing University of Posts and Telecommunications · National University of Singapore +5 more

Defends diffusion LLMs against jailbreaks by fixing greedy remasking bias and block-level autonomous safety repair

Prompt Injection nlp
3 citations 2 influentialPDF Code