ML Security Papers

Stats

Latest papers

3 papers

attack arXiv Nov 12, 2025 · Nov 2025

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Shigeki Kusaka, Keita Saito, Mikoto Kudo et al. · University of Tsukuba · RIKEN +2 more

Theoretically minimizes label-flipping attack cost during RLHF/DPO alignment using convex optimization post-processing

Data Poisoning Attack Training Data Poisoning nlp

1 citations PDF Code

benchmark arXiv Oct 1, 2025 · Oct 2025

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda et al. · Turing Inc. · Institute of Science Tokyo +2 more

Reveals Differential Attention transformers are structurally more fragile to adversarial perturbations than standard attention via negative gradient alignment theory

Input Manipulation Attack visionmultimodal

PDF

attack arXiv Jan 4, 2025 · Jan 2025

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors

Chia-Yi Hsu, Yu-Lin Tsai, Yu Zhe et al. · National Yang Ming Chiao Tung University · University of Tsukuba +2 more

Backdoor attack on task vectors that persists across task learning, forgetting, and analogy arithmetic operations, evading all tested defenses

Model Poisoning Transfer Learning Attack visionnlpmultimodal

2 citations PDF

Latest papers

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue