Latest papers

1 papers
attack arXiv Dec 12, 2025 · Dec 2025

Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously

Andrew Adiletta, Kathryn Adiletta, Kemal Derya et al. · MITRE · Worcester Polytechnic Institute

Adversarial token suffixes that bypass LLM alignment and safety guard models simultaneously via joint gradient optimization

Input Manipulation Attack Prompt Injection nlp
PDF