Ferdinando Fioretto

h-index: 13 748 citations 74 papers (total)

Papers in Database (1)

defense arXiv Jan 21, 2026 · 10w ago

NeuroFilter: Privacy Guardrails for Conversational LLM Agents

Saswat Das, Ferdinando Fioretto · University of Virginia

Activation-space guardrail detects privacy-violating LLM agent prompts using linear probes and cumulative drift across multi-turn conversations

Prompt Injection Sensitive Information Disclosure nlp
PDF