Chris Ngo

h-index: 2 59 citations 5 papers (total)

Papers in Database (2)

attack arXiv Jan 27, 2026 · 9w ago

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Quy-Anh Dang, Chris Ngo · VNU University of Science · Knovel Engineering Lab

Norm-preserving activation steering attack bypasses LLM safety alignment with 5.5x higher jailbreak success than prior methods

Prompt Injection nlp
PDF Code
benchmark arXiv Jan 7, 2026 · 12w ago

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Quy-Anh Dang, Chris Ngo, Truong-Son Hy · VNU University of Science · Knovel +1 more

Aggregates 37 red-teaming datasets into a unified LLM benchmark with standardized taxonomy across 22 risk categories

Prompt Injection nlp
PDF Code