Mubarak Shah

Papers in Database (1)

defense arXiv Apr 10, 2026 · 6d ago

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

Jinqi Luo, Jinyu Yang, Tal Neiman et al. · University of Pennsylvania · Amazon +1 more

Activation steering defense using sparse autoencoders and concept dictionaries to safeguard multimodal LLMs against jailbreaks

Prompt Injection nlpvisionmultimodal
PDF