Latest papers

3 papers
attack arXiv Nov 13, 2025 · Nov 2025

Trapped by Their Own Light: Deployable and Stealth Retroreflective Patch Attacks on Traffic Sign Recognition Systems

Go Tsuruoka, Takami Sato, Qi Alfred Chen et al. · Waseda University · University of California +2 more

Physical retroreflective adversarial patch on traffic signs achieves 93.4% attack success while remaining visually indistinguishable from benign signs

Input Manipulation Attack vision
PDF
attack arXiv Nov 12, 2025 · Nov 2025

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Shigeki Kusaka, Keita Saito, Mikoto Kudo et al. · University of Tsukuba · RIKEN +2 more

Theoretically minimizes label-flipping attack cost during RLHF/DPO alignment using convex optimization post-processing

Data Poisoning Attack Training Data Poisoning nlp
1 citations PDF Code
defense arXiv Oct 1, 2025 · Oct 2025

Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability

Shojiro Yamabe, Jun Sakuma · Institute of Science Tokyo · RIKEN

Discovers token-injection jailbreak in diffusion LMs and proposes safety alignment to defend contaminated intermediate denoising states

Input Manipulation Attack Prompt Injection nlp
PDF Code