Firstname8 Lastname8

Papers in Database (2)

defense arXiv Feb 4, 2026 · 8w ago

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Firstname1 Lastname1, Firstname2 Lastname2, Firstname3 Lastname3 et al. · University of YYY · Company Name +1 more

Expert-level safety alignment for MoE LLMs that surgically repairs jailbreak-activated experts to defeat routing-based bypasses

Prompt Injection nlp
PDF
defense arXiv Feb 16, 2026 · 7w ago

Closing the Distribution Gap in Adversarial Training for LLMs

Firstname1 Lastname1, Firstname2 Lastname2, Firstname3 Lastname3 et al. · University of YYY · Company Name +1 more

Proposes Distributional Adversarial Training using Diffusion LLMs to close coverage gaps and harden LLMs against natural-language jailbreaks

Prompt Injection nlpgenerative
PDF