Chris Ngo

attack arXiv Jan 27, 2026 · 9w ago

Quy-Anh Dang, Chris Ngo · VNU University of Science · Knovel Engineering Lab

Norm-preserving activation steering attack bypasses LLM safety alignment with 5.5x higher jailbreak success than prior methods

Prompt Injection nlp

benchmark arXiv Jan 7, 2026 · 12w ago

Quy-Anh Dang, Chris Ngo, Truong-Son Hy · VNU University of Science · Knovel +1 more

Aggregates 37 red-teaming datasets into a unified LLM benchmark with standardized taxonomy across 22 risk categories

Prompt Injection nlp

Papers in Database (2)