defense 2026

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Justin Albrethsen 1, Yash Datta 1, Kunal Kumar 1,2, Sharath Rajasekar 1

0 citations · 31 references · arXiv (Cornell University)

α

Published on arXiv

2602.16935

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves state-of-the-art F1 of 0.84 for multi-turn jailbreak detection, substantially outperforming leading guardrail models (Llama-Prompt-Guard-2, Granite-Guardian at 0.67) with sub-20ms latency.

DeepContext

Novel technique introduced


While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of disconnected events. This lack of temporal awareness facilitates a "Safety Gap" where adversarial tactics, like Crescendo and ActorAttack, slowly bleed malicious intent across turn boundaries to bypass stateless filters. We introduce DeepContext, a stateful monitoring framework designed to map the temporal trajectory of user intent. DeepContext discards the isolated evaluation model in favor of a Recurrent Neural Network (RNN) architecture that ingests a sequence of fine-tuned turn-level embeddings. By propagating a hidden state across the conversation, DeepContext captures the incremental accumulation of risk that stateless models overlook. Our evaluation demonstrates that DeepContext significantly outperforms existing baselines in multi-turn jailbreak detection, achieving a state-of-the-art F1 score of 0.84, which represents a substantial improvement over both hyperscaler cloud-provider guardrails and leading open-weight models such as Llama-Prompt-Guard-2 (0.67) and Granite-Guardian (0.67). Furthermore, DeepContext maintains a sub-20ms inference overhead on a T4 GPU, ensuring viability for real-time applications. These results suggest that modeling the sequential evolution of intent is a more effective and computationally efficient alternative to deploying massive, stateless models.


Key Contributions

  • Stateful RNN-based monitoring framework that propagates a hidden state across conversation turns to detect gradual accumulation of adversarial intent
  • Fine-tuned turn-level embeddings that encode per-turn risk signals as input to the recurrent architecture
  • Real-time deployment feasibility demonstrated via sub-20ms inference on a T4 GPU, with F1=0.84 outperforming Llama-Prompt-Guard-2 and Granite-Guardian (both 0.67)

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmrnntransformer
Threat Tags
black_boxinference_time
Applications
llm safety guardrailsreal-time jailbreak detectionconversational ai safety