Shaopeng Fu

Papers in Database (1)

defense arXiv Apr 14, 2026 · 3d ago

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

Shaopeng Fu, Di Wang · King Abdullah University of Science and Technology

Proves why continuous adversarial training defends LLMs against jailbreaks and proposes embedding regularization for better robustness

Input Manipulation Attack Prompt Injection nlp
PDF Code