ML Security Papers

ML Security Papers

Latest papers

1 papers

attack arXiv Dec 12, 2025 · Dec 2025

Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously

Andrew Adiletta, Kathryn Adiletta, Kemal Derya et al. · MITRE · Worcester Polytechnic Institute

Adversarial token suffixes that bypass LLM alignment and safety guard models simultaneously via joint gradient optimization

Input Manipulation Attack Prompt Injection nlp