Keita Saito

h-index: 1 65 citations 2 papers (total)

Papers in Database (1)

attack arXiv Nov 12, 2025 · Nov 2025

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Shigeki Kusaka, Keita Saito, Mikoto Kudo et al. · University of Tsukuba · RIKEN +2 more

Theoretically minimizes label-flipping attack cost during RLHF/DPO alignment using convex optimization post-processing

Data Poisoning Attack Training Data Poisoning nlp
1 citations PDF Code