Jason Vega

defense arXiv Dec 5, 2025 · Dec 2025

Matching Ranks Over Probability Yields Truly Deep Safety Alignment

Jason Vega, Gagandeep Singh · University of Illinois Urbana-Champaign

Proposes RAP attack bypassing LLM deep-safety-alignment defenses via rank-guided token selection, then fixes it with attention-regularization defense PRESTO

Prompt Injection nlp

PDF Code

Papers in Database (1)

Matching Ranks Over Probability Yields Truly Deep Safety Alignment