Maheep Chaudhary

defense arXiv Nov 11, 2025 · Nov 2025

Shourya Batra, Pierce Tillman, Samarth Gaggar et al. · Independent · Algoverse +3 more

Activation steering defense that reduces sensitive user data leakage in LLM chain-of-thought reasoning traces at inference time

Sensitive Information Disclosure nlp

4 citations 1 influentialPDF

defense arXiv Feb 21, 2026 · 6w ago

Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav et al.

Diffusion-based defense projects LLM hidden states onto benign manifolds at inference time to neutralize jailbreak attacks

Input Manipulation Attack Prompt Injection nlp

Papers in Database (2)