attack 2025

Beyond Context: Large Language Models Failure to Grasp Users Intent

Ahmed M. Hussain , Salahuddin Salahuddin , Panos Papadimitratos

1 citations · 114 references · arXiv

α

Published on arXiv

2512.21110

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Reasoning-enabled LLM configurations amplified jailbreak success by increasing factual precision, while Claude Opus 4.1 was the only model to prioritize intent detection over information provision in some cases.


Current Large Language Models (LLMs) safety approaches focus on explicitly harmful content while overlooking a critical vulnerability: the inability to understand context and recognize user intent. This creates exploitable vulnerabilities that malicious users can systematically leverage to circumvent safety mechanisms. We empirically evaluate multiple state-of-the-art LLMs, including ChatGPT, Claude, Gemini, and DeepSeek. Our analysis demonstrates the circumvention of reliable safety mechanisms through emotional framing, progressive revelation, and academic justification techniques. Notably, reasoning-enabled configurations amplified rather than mitigated the effectiveness of exploitation, increasing factual precision while failing to interrogate the underlying intent. The exception was Claude Opus 4.1, which prioritized intent detection over information provision in some use cases. This pattern reveals that current architectural designs create systematic vulnerabilities. These limitations require paradigmatic shifts toward contextual understanding and intent recognition as core safety capabilities rather than post-hoc protective mechanisms.


Key Contributions

  • Identifies intent recognition failure (not just content filtering gaps) as a systematic architectural vulnerability in current LLMs
  • Demonstrates three exploitation techniques — emotional framing, progressive revelation, and academic justification — that reliably bypass safety mechanisms across ChatGPT, Claude, Gemini, and DeepSeek
  • Finds that reasoning-enabled model configurations amplify rather than mitigate jailbreak effectiveness by increasing factual precision without interrogating underlying user intent

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Applications
conversational aillm safety systems