Iryna Gurevych

Papers in Database (1)

attack arXiv Jan 3, 2025 · Jan 2025

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Rachneet Sachdeva, Rima Hazra, Iryna Gurevych · Technical University of Darmstadt

Proposes POATE jailbreak using polar-opposite contrastive queries to bypass LLM safety, achieving 44% higher attack success than prior methods

Prompt Injection nlp
PDF