Byron C. Wallace

h-index: 3 137 citations 7 papers (total)

Papers in Database (1)

benchmark arXiv Sep 25, 2025 · Sep 2025

Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models

Chantal Shaib, Vinith M. Suriyakumar, Levent Sagun et al. · Northeastern University · MIT +1 more

Exploits learned syntactic-domain correlations to bypass LLM safety refusals via malformed or domain-mismatched prompts

Prompt Injection nlp
2 citations PDF