Wei Xiao

h-index: 2 31 citations 5 papers (total)

Papers in Database (1)

defense arXiv Feb 23, 2026 · 6w ago

BarrierSteer: LLM Safety via Learning Barrier Steering

Thanh Q. Tran, Arun Verma, Kiwan Wong et al. · National University of Singapore · Singapore-MIT Alliance for Research and Technology Centre +2 more

Defends LLMs against jailbreaks and adversarial attacks by enforcing CBF-based safety constraints in latent representation space at inference time

Input Manipulation Attack Prompt Injection nlp
PDF