Chirag Agarwal

Papers in Database (1)

defense arXiv Apr 20, 2026 · 4w ago

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal · University of Virginia

Sparse Autoencoders as inference-time jailbreak defense, achieving 5x attack success reduction via representational bottleneck

Input Manipulation Attack Prompt Injection nlp
PDF